Files in this item

FilesDescriptionFormat

application/pdf

application/pdfCHRISTENSEN-DISSERTATION-2020.pdf (4MB)
(no description provided)PDF

Description

Title:Algorithms for phylogenetic tree correction in species and cancer evolution
Author(s):Christensen, Sarah Ashley
Director of Research:El-Kebir, Mohammed; Warnow, Tandy
Doctoral Committee Chair(s):El-Kebir, Mohammed
Doctoral Committee Member(s):Har-Peled, Sariel; Nakhleh, Luay
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):phylogenetics
evolution
gene tree correction
tumor heterogeneity
Abstract:Reconstructing evolutionary trees, also known as phylogenies, from molecular sequence data is a fundamental problem in computational biology. Classically, evolutionary trees have been estimated over a set of species, where leaves correspond to extant species and internal nodes correspond to ancestral species. This type of phylogeny is colloquially thought of as the “Tree of Life” and assembling it has been designated as a Grand Challenge by the National Science Foundation Advisory Committee for Cyberinfrastructure. However, processes other than speciation are also shaped by evolution. One notable example is in the development of a malignant tumor; tumor cells rapidly grow and divide, acquiring new mutations with each subsequent generation. Tumor cells then compete for resources, often resulting in selection for more aggressive cell types. Recent advancements in sequencing technology rapidly increased the amount of sequencing data taken from tumor biopsies. This development has allowed researchers to attempt reconstructing evolutionary histories for individual patient tumors, improving our understanding of cancer and laying the groundwork for precision therapy. Despite algorithmic improvements in the estimation of both species and tumor phylogenies from molecular sequence data, current approaches still suffer a number of limitations. Incomplete sampling and estimation error can lead to missing leaves and low-support branches in the estimated phylogenies. Moreover, commonly posed optimization problems are often under-determined given the limited amounts and low quality of input data, leading to large solution spaces of equally plausible phylogenies. In this dissertation, we explore current limitations in both species and tumor phylogeny estimation, connecting similarities and highlighting key differences. We then put forward four new methods that improve phylogeny estimation methods by incorporating auxiliary information: OCTAL, TRACTION, PhySigs, and RECAP. For each method, we present theoretical results (e.g., optimization problem complexity, algorithmic correctness, running time analysis) as well as empirical results on simulated and real datasets. Collectively, these methods show we can significantly improve the accuracy of leading phylogeny estimation methods by leveraging additional signal in distinct, but related datasets.
Issue Date:2020-11-30
Type:Thesis
URI:http://hdl.handle.net/2142/109395
Rights Information:Copyright 2020 Sarah Christensen
Date Available in IDEALS:2021-03-05
Date Deposited:2020-12


This item appears in the following Collection(s)

Item Statistics