Files in this item



application/pdfMethods for Clu ... y Gene Expression Data.pdf (912kB)
(no description provided)PDF


Title:Methods for Cluster Analysis and Validation in Microarray Gene Expression Data
Author(s):Kosorukoff, Alexander Lvovich
cluster analysis
Abstract:Motivation: Unsupervised learning or clustering is frequently used to explore gene expression profiles for insight into both regulation and function. However, the quality of clustering results is often difficult to assess and each algorithm has tunable parameters with often no obvious way to choose appropriate values. Most algorithms also require the number of clusters to be predetermined yet this value is rarely known and, thus, is arrived at by subjective criteria. Here we present a method to systematically address these challenges using statistical evaluation. Method: The method presented compares the quality of clustering results in order to choose the most appropriate algorithm, distance metric and number of clusters for gene network discovery using objective criteria. In brief, two quality assessment metrics are used: the Consensus Share (CS) and the Feature Configuration Statistic (FCS). CS is the percentage of genes (not gene pairs) that are identically clustered in several clusterings and FCS is a measure of randomness of the observed configuration of transcription factor binding sites among clustered genes. Results: We evaluate this method using both artificial and yeast microarray data. By choosing parameters settings that minimize FCS values and maximize CS values we show major advantages over other clustering methods in particular for identifying combinatorially regulated groups of genes. The results produced provide remarkable enrichment for cis-regulatory elements in clusters of genes known to be regulated by such elements and evidence of extensive combinatorial regulation. Moreover, the method can be generalized when prior information about cis-regulatory sites is absent or it is desirable to calculate FCS values based on functional categorization.
Issue Date:2006-05
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2006-2700
Rights Information:You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS:2009-04-21

This item appears in the following Collection(s)

Item Statistics