This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Probabilistic subclonal reconstruction for cancer
Director of Research (if dissertation) or Advisor (if thesis)
Doctoral Committee Chair(s)
Department of Study
Electrical & Computer Eng
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Tumor phylogenetics, Probabilistic modeling
Cancer consists of genetically heterogeneous populations of cells that arise through a process of subclonal evolution. Reconstructing the evolutionary processes that give rise to cancer can help us better understand cancer progression and prioritize treatment targets. The subclonal reconstruction of cancer gives us the information about the co-occurrence of mutations within the same subclone, the underlying proportion of cells belonging to each subclone, and the ancestral relationships between them. The evolutionary process can be described by inferring tumor phylogenetic trees. The majority of current approaches focus only on either mutation clustering or tree inference in isolation, or rely on computationally expensive algorithms to holistically consider clustering and tree inference concurrently.
In this dissertation, we formalize the problem of reconstructing subclonal structure for cancer via probabilistic modeling. Using variant and total read count obtained from bulk DNA sequencing data as input, we introduce a tree-constrained binomial mixture model and an expectation-maximization (EM) method to estimate the clustering assignment for each mutation and the underlying frequency for each cluster. Our EM algorithm employs a linear programming approach to accurately maximize the likelihood bound subject to tree constraints. We choose the optimal tree topology by repeating the process across all possible tree topologies. Compared to existing work, the resulting ClusTree algorithm more accurately identifies mutation clusters, estimates frequencies for each cluster, and detects the proper tree topology, especially for low-depth sequencing data.