IDEALS Home University of Illinois at Urbana-Champaign logo The Alma Mater The Main Quad

Enhanced transcriptome profiling and biomarker discovery using meta-analytical techniques

Show full item record

Bookmark or cite this item:

Files in this item

File Description Format
PDF Adams_Heather.pdf (2MB) Dissertation PDF
ZIP (15MB) All Appendices (in zipped file) ZIP
PDF Appendix A.pdf (1MB) Appendix A PDF
PDF Appendix B.pdf (5MB) Appendix B PDF
PDF Appendix C.pdf (4MB) Appendix C PDF
PDF Appendix D.pdf (5MB) Appendix D PDF
PDF Appendix E.pdf (4MB) Appendix E PDF
PDF Appendix F.pdf (4MB) Appendix F PDF
PDF Appendix G.pdf (4MB) Appendix G PDF
Title: Enhanced transcriptome profiling and biomarker discovery using meta-analytical techniques
Author(s): Adams, Heather A.
Director of Research: Rodriguez-Zas, Sandra L.
Doctoral Committee Chair(s): Rodriguez-Zas, Sandra L.
Doctoral Committee Member(s): Lewin, Harris A.; Robinson, Gene E.; Shanks, Roger D.
Department / Program: Animal Sciences
Discipline: Animal Sciences
Degree Granting Institution: University of Illinois at Urbana-Champaign
Degree: Ph.D.
Genre: Dissertation
Subject(s): meta-analysis gene expression microarray classification honey bee maturation embryo development
Abstract: This is a comprehensive study of the application of meta-analytical techniques to gene expression data to detect differentially expressed genes across experiments and conditions, and as a means of classifying unknown samples. Three chapters were created to: (1) investigate three novel meta-analysis approaches as a means of combining and analyzing gene expression experiments for differential expression across experiments, (2) develop a novel meta-classification analysis to accurately classify samples to groups, and identify candidate transcript biomarkers, and (3) apply novel meta-analysis approaches to experiments with related but different conditions. The novelty of the approaches is seen with the inclusion of estimates of differential expression (effect size) as well as the sign of expression (up- or down-regulation) for genes, where previous meta-analysis studies have combined results based solely on P-values or effect sizes alone. Aim 1: The objectives were 1) to use different types of information and develop and compare different meta-analyses approaches to addresses limitations of previous microarray meta-analysis work and 2) to apply these approaches to enhance the knowledge on the expression patterns between honey bees representing different age and behavior groups. A novel study-level meta-analysis approach was developed that combined estimates of fold change between groups because in many cases, only fold change results are available. The novelty of this approach stemmed from the adjustment of the fold change estimates by their precision and weighting of each study by the individual variation. Recognizing the importance of considering within- and between-study variation from the study-level meta-analysis, a sample-level meta-analysis model was extended to include heterogeneity between studies. The sample-level meta-analysis combined normalized gene expression values and the comparison of study and sample-level meta-analyses allowed for the assessment of the relative advantages of using two types of information. Lastly, the new study- and sample-level approaches were compared to the traditional approach of identifying overlapping genes among the list of differentially expressed genes between two or more studies. Aim 2: The objectives were to 1) develop and test complementary meta-classification approaches to integrate different types of information across studies and gain precision on the identification of gene expression biomarkers to classify samples, 2) to apply the new approaches to identify gene expression profiles that can accurately classify two honey bee behavior groups, and 3) to compare different methods to identify a subset of gene profiles that can provide accurate and generalizable sample classification. In the context of using gene expression information from multiple microarray experiments to classify honey bee samples into age or behavior groups, the response and explanatory variables were gene expression and group respectively in the previous chapter, meanwhile they correspond to group and gene expression, respectively in the present chapter. A linear mixed-effects model was used to meta-analyze the Normally distributed gene expression variable from Aim 1, and a binary logistic mixed-effects model was used to meta-classify samples in this aim. A novel three-stage strategy that allowed the identification, selection and combination of accurate and generalizable biomarkers was developed. In the first stage, three approaches that integrate information across studies within gene were implemented: a) individual-study analysis, b) study-level meta-classification analysis, and c) sample-level meta-classification analysis. Promising classifier genes to use in the second stage of the classification strategy were identified based on the statistical significance and consistency of the association with the odds of behavior across analyses. In the second stage, results from the first stage were incorporated into multiple classification approaches to develop classifier functions and identify classifier genes. Four classification techniques based on the simultaneous consideration of multiple genes were developed, and selection methods were used to address potential over-parameterization of the model. In the third stage, additional biomarker gene sets with comparable classification performance were uncovered from all genes in the studies using a score selection method, which allowed for flexibility to arrive at models that produced great classifiers without generating an excessive number of models with low performance. Classification functions were trained on numerous datasets, allowing for the improved ability to detect biomarkers that are strong indicators of conditional class using a larger sample of observations from the population. These biomarkers were tested on an additional dataset to assess their performance in prediction of classes. Aim 3: The objectives were designed to address the situation of integrating information from studies that do not share conditions, and included: 1) development of a meta-analysis approach that allowed to combine information across partially overlapping studies, 2) implementation of the approach to characterize gene expression profiles in cattle embryos spanning a variety of conditions, 3) the comparison of results of this novel approach against other standard meta-analysis approaches. To accomplish the first objective, the model developed for the sample-level meta-analysis approach in the second aim was re-parameterized. To accomplish the second objective, four cattle microarray experiments that partially overlapped on the conditions studied were meta-analyzed. The samples included 7 day embryos, 25 day embryos and extra-embryonic tissue, and term (or near term, approximately 280 day) extra-embryonic tissue and a reference sample. In addition to developmental age and tissue source, samples pertained to one of two reproductive technologies, artificial insemination or somatic cell nuclear transfer. The availability of conditions not present in all studies allowed testing of the novel meta-analysis model. The availability of the two reproductive technologies across all studies allowed the comparison of standard and novel meta-analysis techniques. Results from this novel extension allowed the characterization of comprehensive expression profiles that extend conditions not studied within a single experiment.
Issue Date: 2010-05-14
Rights Information: © 2010 Heather Ann Adams
Date Available in IDEALS: 2012-05-15
Date Deposited: May 2010

This item appears in the following Collection(s)

Show full item record

Item Statistics

  • Total Downloads: 458
  • Downloads this Month: 12
  • Downloads Today: 1


My Account


Access Key