Files in this item

FilesDescriptionFormat

application/pdf

application/pdfAGUSE-THESIS-2020.pdf (3MB)
(no description provided)PDF

Description

Title:Methods to summarize and reduce the solution space of tumor phylogeny inference
Author(s):Aguse, Nuraini Binti
Advisor(s):El-Kebir, Mohammed
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Tumor Phylogeny
Summary
Single-cell sequencing
Abstract:Cancer phylogenies are key to studying tumorigenesis and have clinical implications. Due to the heterogeneous nature of cancer and limitations in current sequencing technology, current cancer phylogeny inference methods identify a large solution space of plausible phylogenies. To facilitate further downstream analysis, one can either summarize the set of cancer phylogenies or use additional data to eliminate trees and further reduce the solution space. Current summary methods are limited to a single consensus tree or graph and may miss important topological features that are present in different subsets of candidate trees. On the other hand, while single-cell sequencing (SCS) provides the data that we need to reduce solution space, it may become prohibitively costly as the number of cells to sequence increases. In this thesis, we first introduce the Multiple Consensus Tree (MCT) problem to simultaneously cluster the trees in the solution space and infer a consensus tree for each cluster. We show that MCT is NP-hard, and present an exact algorithm based on mixed integer linear programming (MILP) and a heuristic algorithm that efficiently identifies high-quality consensus trees. We demonstrate the applicability of our methods on both simulated and real data, showing that our approach selects the number of clusters depending on the complexity of the solution space. Next, we introduce PhyDOSE, a method that uses bulk sequencing data to strategically optimize the design of follow-up single-cell sequencing experiments. We incorporate distinguishing features - features that uniquely identify a tree - into a probabilistic model that infers the number of cells to sequence so as to confidently reconstruct the phylogeny of the tumor. We validate PhyDOSE using simulations and a retrospective analysis of a childhood leukemia patient, concluding that PhyDOSE's computed number of cells resolves tree ambiguity even in the presence of typical single-cell sequencing errors. We also conduct a retrospective analysis on an acute myeloid leukemia cohort, demonstrating the potential of significant reduction in the number of cells to sequence. In a prospective analysis, we demonstrate that only a small number of cells suffice to disambiguate the solution space of trees in a recent lung cancer cohort. Finally, we provide an R package and web interface for the ease of use of PhyDOSE.
Issue Date:2020-05-12
Type:Thesis
URI:http://hdl.handle.net/2142/108034
Rights Information:Copyright 2020 Nuraini Aguse
Date Available in IDEALS:2020-08-26
Date Deposited:2020-05


This item appears in the following Collection(s)

Item Statistics