Extracting biological information from DNA-biomolecule interactions using machine learning and statistics
Yuan, Jimmy Bo
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/127314
Description
Title
Extracting biological information from DNA-biomolecule interactions using machine learning and statistics
Author(s)
Yuan, Jimmy Bo
Issue Date
2024-08-16
Director of Research (if dissertation) or Advisor (if thesis)
Song, Jun S
Doctoral Committee Chair(s)
Dahmen, Karin A
Committee Member(s)
Aksimentiev, Aleksei
Perez-Pinera, Pablo
Department of Study
Physics
Discipline
Physics
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Machine learning
statistics
biophysics
genomics
bioinformatics
protein-DNA interactions
genome editing
tumor biology
Abstract
DNA-biomolecule interactions such as protein-DNA and RNA-DNA interactions give rise to many important biological phenomena, ranging from DNA replication to gene regulation. Understanding the features that govern the success of these DNA-biomolecule interactions can thus aid in the understanding of fundamental biological processes and in the development of therapeutics. While the strength of these DNA-biomolecule interactions is often determined by a large number of degrees of freedom, the abundance of biological data from high-throughput sequencing experiments and the development of statistical learning algorithms have made it easier to understand these seemingly intractable biomolecular rules. This thesis represents our efforts to use large sequencing datasets in conjunction with machine learning and other statistical techniques to extract the most salient degrees of freedom in specific DNA-biomolecule interactions. Using neural networks and other learning algorithms to analyze transcription factor ChIP-seq datasets, we first explored the example of protein-DNA interactions due to the SOX10 transcription factor, and how mutations in the SOX10 DNA-binding domain alter the protein-DNA interactions genome-wide. Using multimodal neural networks and other statistical techniques with transcription factor ChIP-seq and RNA-seq datasets, we next investigated the role of protein-DNA interactions in the ETS transcription factors family, with an emphasis on the interactions between isoforms/paralogues of the protein GABP with the DNA in the TERT promoter. Finally, using various linear models and dimension reduction methods with amplicon sequencing and histone modification ChIP-seq datasets, we probed the RNA-DNA and protein-DNA interactions that give rise to efficient prime editors. The mechanisms discovered via these DNA-biomolecule interactions can provide insights into methods of treating various diseases such as cancers and other genetic-based disorders.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.