Single-cell multi-omic data analysis with mathematical and statistical methods
Zhang, Shuyi
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/116075
Description
Title
Single-cell multi-omic data analysis with mathematical and statistical methods
Author(s)
Zhang, Shuyi
Issue Date
2022-07-13
Director of Research (if dissertation) or Advisor (if thesis)
Song, Jun S
Doctoral Committee Chair(s)
Golding, Ido
Committee Member(s)
Kim, Sangjin
Zhao, Sihai Dave
Department of Study
Physics
Discipline
Physics
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Sequencing analysis
Stochastic processes
Information geometry
Spectral graph theory
Abstract
Recent advances in next-generation sequencing-based single-cell technologies have allowed high-throughput quantitative detection of cell-surface proteins along with the transcriptome in individual cells, extending our understanding of the heterogeneity of cell populations in diverse tissues that are in different diseased states or under different experimental conditions. From the cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) technology, in particular, count data of surface proteins allow for immunophenotyping of cells yet pose new computational challenges; there is currently a dearth of rigorous mathematical tools for analyzing the data. In this thesis, we seek to address three issues in data analysis for CITE-seq, namely, removing the systematic biases between samples, calling true signals from noise, and merging information from multiple modalities. First, we utilize concepts and ideas from Riemannian geometry to remove batch effects between samples. Subsequently, we develop a framework for distinguishing positive signals from background noise using statistical inference and multiple testing. Lastly, we use the ideas of Hamiltonian operators and density matrices from physics and introduce a unified graph-based learning scheme for effectively merging information from multiple modalities. The strengths of these approaches are demonstrated on CITE-seq data sets of mouse and human tissue samples. The geometrical methods for batch correction, the statistical methods for signal detection, and the graph-based methods for effectively merging the multiple modalities that we introduce in this thesis provide promising frameworks based on ideas from mathematics, statistics, and physics for analyzing the multi-omic data generated using the CITE-seq technology.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.