Files in this item



application/pdfSASHITTAL-THESIS-2021.pdf (9MB)
(no description provided)PDF


Title:Algorithms for infection and cancer genomics
Author(s):Sashittal, Palash
Advisor(s):El-Kebir, Mohammed
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Infection genomics
Cancer genomics
Combinatorial optimization
Transcript assembly
Doublet detection
Abstract:Continuous innovations and advances in sequencing technologies have led to the birth and development of several fields of research. In this thesis we propose four methods to address open problems in two such fields, infection genomics and cancer genomics. The first problem we address is reconstruction of transmission history of an outbreak using genomic and epidemiological data collected from infected hosts. It is challenging to account for all the relevant biological processes that occur during evolution and transmission of the pathogens in the outbreak while also addressing the uncertainty in the most likely solution. Our method, TiTUS, overcomes these challenges by first uniformly sampling from the set of all possible feasible transmission histories of the outbreak under a realistic model of evolution and transmission. Then, a consensus-based solution is generated that summarizes the candidate solutions in a biologically meaningful way. We show that TiTUS efficiently samples the solution space enabling accurate reconstruction of transmission history of an outbreak. The second method we introduce, Jumper, reconstructs viral transcripts using RNA-sequencing data from infected cells. In this study, we focus our attention on viruses in the Coronaviridae family, such as SARS-CoV-2, that express genes by a process of discontinuous transcription mediated by the viral RNA-dependent RNA polymerase. The viral transcriptome provides valuable information with clinical implications such as differential expression of viral genes, the host cell response to viral infection and the viral life cycle. We show that Jumper accurately infers the viral transcripts, outperforming existing transcript assembly methods, and facilitates the study of coronavirus transcriptomes under varying conditions. The third problem we address is doublet detection in single-cell DNA-sequencing data. Our method, doubletD, is the first stand-alone doublet detection method for single-cell DNA-sequencing data. We use a simple probabilistic model allowing a closed-form maximum likelihood solution that efficiently and accurately detects doublets by identifying characteristic signal in the variant allele frequency (VAF) distribution in the data. On simulations and multiple real datasets, we show that doublet identification and removal using doubletD improves downstream analysis such as genotype calling and phylogeny reconstruction. Finally, we present a new method, PACTION, which proposes a solution to the tumor phylogeny inference problem in cancer. Due to technological and methodological limitations, existing methods are restricted to identifying tumor clones and phylogenies only based on either small-scale mutations, such as single nucleotide variations (SNVs), or large-scale mutations, such as copy number aberrations (CNAs), preventing a comprehensive characterization of a tumor’s clonal composition. To overcome these challenges, we formulate the identification of clones in terms of both SNVs and CNAs as a reconciliation problem. We show that PACTION reliably identifies tumor clones and their evolutionary relationships even in the presence of noise or error in input SNVs and CNAs.
Issue Date:2021-07-19
Rights Information:Copyright 2021 Palash Sashittal
Date Available in IDEALS:2022-01-12
Date Deposited:2021-08

This item appears in the following Collection(s)

Item Statistics