Quantifying and summarizing tumor phylogeny solution spaces
Qi, Yuanyuan
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/125664
Description
Title
Quantifying and summarizing tumor phylogeny solution spaces
Author(s)
Qi, Yuanyuan
Issue Date
2024-06-11
Director of Research (if dissertation) or Advisor (if thesis)
El-Kebir, Mohammed
Doctoral Committee Chair(s)
El-Kebir, Mohammed
Committee Member(s)
Warnow, Tandy
Milenkovic, Olgica
Oesper, Layla
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
cancer
intra-tumor heterogeneity
consensus
infinite-sites assumption
Abstract
Cancer phylogenies are crucial for understanding tumor development and have significant clinical applications. However, due to the heterogeneity of cancer cells and limitations of current sequencing technologies, it is impractical to conclusively determine a single tree. Despite this, downstream analysis typically requires a single or a small number of trees per patient. As a result, cancer phylogeny inference methods that aim to enumerate all plausible trees quickly become unscalable as the number of mutations grows. Similarly, methods that attempt to sample high-likelihood phylogenies, often based on Markov Chain Monte Carlo (MCMC) techniques, exhibit biases in their sampling results. Another approach involves summarizing multiple possible trees into one or a few trees. However, these methods rely heavily on the quality of the given trees, which, as mentioned earlier, is challenging to enumerate or sample accurately. This thesis addresses these challenges from three aspects.
In the first part, we delve into the challenges of cancer phylogeny inference using bulk data, more specifically, we study the hardness of enumeration and sampling of cancer phylogenies. We illustrate how the number of possible phylogenies grows exponentially. Additionally, we show that current sampling methods exhibit bias in their sampling results. Furthermore, we provide theoretical proof of the complexity of uniform sampling. This work establishes theoretical foundations for phylogeny inference from bulk data.
In the second and third part, we focus on the problem of summarizing a given set of possible phylogenies with one or a few trees. In the second part, we generalize the problem of inferring a single consensus tree to inferring multiple consensus trees. We delve into the complexity of this problem and propose two methods to address it. We show that the multiple consensus tree is more capable and provides a better summary than a single consensus tree. In the third part, we explore the single consensus tree problem but under a different distance measure which provides a better resolution. We establish the NP-hardness of this problem under the specified distance measure.
In the fourth part, we address the challenge by directly summarizing the solution space from bulk data with backbone trees. We introduce a novel method for inferring backbone trees, aimed at efficiently summarizing the solution space. We demonstrate that these back- bone trees offer a comparable summarization result to existing methods. Furthermore, we extend the method to expand these backbone trees into full trees. Our findings reveal that the full trees generated from this expansion process exhibit higher quality compared to current tree inference methods.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.