Withdraw
Loading…
Deciphering protein binding specificity with deep learning
Su, Yufeng
This item's files can only be accessed by the System Administrators group.
Permalink
https://hdl.handle.net/2142/127460
Description
- Title
- Deciphering protein binding specificity with deep learning
- Author(s)
- Su, Yufeng
- Issue Date
- 2024-11-18
- Director of Research (if dissertation) or Advisor (if thesis)
- Peng, Jian
- Doctoral Committee Chair(s)
- Peng, Jian
- Committee Member(s)
- Ma, Jianzhu
- Han, Jiawei
- El-Kebir, Mohammed
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Machine learning
- Deep learning
- Computational biology
- Abstract
- The interaction between proteins and macromolecules plays a crucial role in biological processes. For example, protein-RNA binding specificity plays a vital role in post-transcriptional gene regulation, protein-protein binding is the key to molecular recognition in the immune system, and protein-ligand binding specificity is important to determine the enzyme function. It is thus important to decipher the protein binding specificity. With biotechnological advances, larger datasets enable data-driven approaches, particularly deep learning, to become the most effective tools for studying protein binding specificity. However, the field still faces several crucial challenges. My thesis aims to present thermodynamics-inspired deep-learning solutions to these challenges and propose novel deep-learning methods for underexplored tasks. I began by focusing on predicting RNA-binding protein (RBP) specificity. RBPs are crucial for regulating cellular RNA processes such as splicing, transport, stability, and translation. Since RBP specificity is determined by both the RNA sequence and its secondary structure, understanding these factors is essential for developing models of posttranscriptional gene regulation. To this end, I proposed a thermodynamic-inspired deep learning model that encodes both sequence and structural context. This model leverages a novel sequence-embedding convolutional neural network applied over a thermodynamic ensemble of RNA secondary structures, achieving high performance. Next, I delved into predicting mutational effects on protein-protein binding, a challenging task due to limited experimental data. Currently, the largest dataset contains only 7,085 data points and covers around 100 different protein-protein complexes, which limits the potential for supervised learning due to overfitting risks. To address this, I developed an unsupervised learning method, RDE, inspired by the thermodynamic principle that protein-protein binding often results in entropy loss at the binding interface. Following this, I explored de novo antibody design. Traditional approaches are often computationally intensive, rely heavily on existing antibody templates, and are dependent on expert knowledge. To overcome these limitations, I designed a diffusion-based generative model that uniquely samples both the sequences and structures of Complementarity-Determining Regions (CDRs) based on antigen structures, addressing real-world application needs. Lastly, I aimed to build a general model for enzyme-substrate specificity prediction. Existing methods are typically enzyme-specific and ignore the 3D structural context of enzyme-substrate binding. To overcome this, I developed EZSpecificity, a cross-attention-empowered SE(3)-equivariant graph neural network architecture, trained on a comprehensive, custom-built database of enzyme-substrate interactions that includes predicted structural information. The trained model has been proven to work across various enzyme families without requiring additional retraining. Together, this dissertation presents a systematic exploration of the potential of deep learning to decipher protein binding specificity.
- Graduation Semester
- 2024-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/127460
- Copyright and License Information
- Copyright 2024 Yufeng Su
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…