Files in this item

FilesDescriptionFormat

application/pdf

application/pdfRANA-THESIS-2021.pdf (2MB)
(no description provided)PDF

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

application/vnd.openxmlformats-officedocument.spreadsheetml.sheetSupplementary_tables.xlsx (34kB)
(no description provided)Microsoft Excel 2007

Description

Title:Small-sample estimation of the mutational support and the distribution of mutations in the SARS-CoV-2 genome
Author(s):Rana, Vishal
Advisor(s):Milenkovic, Olgica
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Comparative ORF study
Good-Turing estimation
Mutation rates
SARS-Cov-2 Data analysis
Small-sample support estimation
Abstract:The problem of accurately estimating and characterizing different mutations in the viral genomes present within a population is of great importance in tracking and mitigating the spread of the virus and is made difficult by the lack of a sufficient number of sequenced genomes especially during the early stages of an outbreak. We consider the problem of determining the mutational support and distribution of mutations in the SARS-Cov-2 genome and its open reading frames (ORFs). The mutational support refers to the unknown number of sites that are mutated among all the viral strains present in a population. The support and distribution of mutations can be used to guide primer selection for RT PCR test kits, study the virulence of the virus, discover adaptation mechanisms deployed by the virus to evade the host immune system, as well as to identify new strains that might be circulating in the population early on. We propose new state-of-the-art polynomial estimation techniques using weighted and regularized Chebyshev approximations for small-sample mutational support estimation and we use a modified Good-Turing estimator for distribution estimation. Our differential analysis of mutations in various population subgroups (based on data retrieved from GISAID repository) revealed several important differences including those in the ORF6 and ORF7a regions for older versus younger patients, ORF1b and ORF10 regions for females versus males, and in several ORFs for Asia versus Europe and North America. We also found no significant mutations in the primer regions from ORF N chosen by CDC for RT-PCR test kits in any of the subpopulations, which is important for the reliability of the test results.
Issue Date:2021-02-23
Type:Thesis
URI:http://hdl.handle.net/2142/110417
Rights Information:Copyright 2021 Vishal Rana
Date Available in IDEALS:2021-09-17
Date Deposited:2021-05


This item appears in the following Collection(s)

Item Statistics