Withdraw
Loading…
Predicting protein-DNA interactions using statistical modeling and biophysical representations of high-throughput sequencing data
Kim, Somang
Loading…
Permalink
https://hdl.handle.net/2142/132458
Description
- Title
- Predicting protein-DNA interactions using statistical modeling and biophysical representations of high-throughput sequencing data
- Author(s)
- Kim, Somang
- Issue Date
- 2025-08-27
- Director of Research (if dissertation) or Advisor (if thesis)
- Song, Jun S.
- Doctoral Committee Chair(s)
- Aksimentiev, Aleksei
- Committee Member(s)
- Perez-Pinera, Pablo
- Kim, Sangjin
- Department of Study
- Physics
- Discipline
- Physics
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- DNMT
- Prime editing
- TERT
- GABP
- Abstract
- Proteins are the cell’s most abundant component, excluding water, and participate in almost every biological process. Their functions include gene regulation, metabolic catalysis, molecular transportation, immune defense, and other critical roles. Such diverse capabilities arise from their intricate interactions and cooperation with other cellular complexes. This study primarily examines three distinct proteins, or protein complexes, and their interaction with DNA by applying statistical modeling on high-throughput sequencing data. First, we investigate the action of DNA methyltransferases in a yeast species naturally free of DNA methylation. Using bisulfite-seq and RNA-seq, we measure DNA methylation rates and the impact of induced methylation on gene expression. To understand the behavior of DNA methyltransferases in the three-dimensional organization of DNA, we build a convolutional neural network to predict the methylation rates of DNA sites and extract features of DNA sequences that are more likely methylated. We further perform fourier transformation and mutual information analysis to deduce the structure of chromatin impacting the interaction between DNA methyltransferases and DNA. Second, we identify conditions in which prime editor, a ribonucleoprotein that can target and edit DNA sites, can operate with greater success. Prime editing is done in several steps involving RNA-DNA binding, DNA nickage, reverse transcription, and DNA repair. Numerous factors have to be considered to predict its efficacy, and we apply multiple statistical models to dissect these factors and uncover salient features. Based on our findings, we provide a guideline for choosing an optimal design of prime editor. Lastly, we study GABP, a transcription factor that can immortalize cancer cells by activating TERT when specific mutations are present in its promoter. We perform and analyze RNA-seq and chromatin immune-precipitation sequencing of GABPA and histone modifications in cancer cell lines with TERT promoter mutation and demonstrate that GABP activates TERT by remodeling the chromatin structure of TERT promoter region.
- Graduation Semester
- 2025-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/132458
- Copyright and License Information
- Copyright 2025 Somang Kim
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…