Files in this item

FilesDescriptionFormat

application/pdf

application/pdfUILU-ENG-06-2216_DC-224 assembled.pdf (4MB)
(no description provided)PDF

Description

Title:Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression
Author(s):Ma, Yi; Derksen, Harm; Hong, Wei; Wright, John
Subject(s):Multivariate mixed data
Data segmentation
Rate distortion
Lossy data coding
Data compression
Image segmentation
Microarray data clustering
Abstract:In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. By analyzing the coding length/rate of mixed data, we formally establish some strong connections of data segmentation to many fundamental concepts in lossy data compression, rate distortion theory, and multiple-channel communications. We show that a deterministic segmentation is the (asymptotically) optimal solution for compressing mixed data. We propose a very simple and effective algorithm to find the optimal segmentation, which does not require any prior knowledge of the number or dimension of the groups, nor does it involve any parameter estimation. Simulation results reveal intriguing phase-transition behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data.
Issue Date:2006-08
Publisher:Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Series/Report:Coordinated Science Laboratory Report no. UILU-ENG-06-2216, DC-224
Genre:Technical Report
Type:Text
Language:English
URI:http://hdl.handle.net/2142/99597
Sponsor:National Science Foundation / NSF CAREER DMS-0349019, NSF CAREER IIS-0347456, NSF CRS-EHS-0509151, and NSF CCF-TF-0514955
ONR YIP N00014-05-1-0633
Date Available in IDEALS:2018-04-03


This item appears in the following Collection(s)

Item Statistics