Withdraw
Loading…
Computational tools for engineering functionally improved genetic components and cellular factories
Boob, Aashutosh Girish
This item's files can only be accessed by the System Administrators group.
Permalink
https://hdl.handle.net/2142/125766
Description
- Title
- Computational tools for engineering functionally improved genetic components and cellular factories
- Author(s)
- Boob, Aashutosh Girish
- Issue Date
- 2024-07-01
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhao, Huimin
- Doctoral Committee Chair(s)
- Zhao, Huimin
- Committee Member(s)
- Sinha, Saurabh
- Rao, Christopher V
- Shukla, Diwakar
- Department of Study
- Chemical & Biomolecular Engr
- Discipline
- Chemical Engineering
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Metabolic Engineering
- Bioinformatics
- Machine Learning
- Programmable Nuclease
- Organelles
- CRISPR/Cas
- Genetic Toolkit
- Synthetic Biology
- Abstract
- Synthetic biology toolkits are invaluable for strain construction and optimization, playing a crucial role in biotechnological and biomedical applications. However, traditional toolkits are composed of manually identified, well-characterized endogenous genetic elements, thereby limiting their scope for pathway construction and optimization. Therefore, an alternate, more comprehensive strategy is required for toolkit design. Bioinformatics and Machine Learning (ML) models offer a systematic route for in-depth analysis of omics datasets for the discovery and design of novel genetic elements. Therefore, in this dissertation, I developed various computational approaches for designing genetic components within a synthetic biology toolkit, with an emphasis on creating a versatile platform suitable for both conventional and non-model organisms. I characterized these elements in vivo for functionality across the sphere of life and showcased their applications for the rapid construction of cellular factories for biochemical production. In the initial section of this dissertation (Chapters 2, 3, and 4), I employ bioinformatics tools to conduct genome-wide searches for neutral integration sites. Leveraging the CRISPR (Clustered Regulatory Interspaced Short Palindromic Repeats)/Cas9 system for genome editing, I devised scripts for the discovery of intergenic sites suitable for gene integration in Issatchenkia orientalis SD108. I, in collaboration with Dr. Zia Fatma and Dr. Shih-I Tan, characterized and analyzed the sites to ensure efficient genome editing. Furthermore, we engineered a landing pad system enabling multiplex engineering, facilitating the simultaneous integration of one gene at five loci or five different genes at individual loci in a single round of transformation. Finally, we demonstrated the application of the landing pad system for 5-aminolevulinic acid (5-ALA) and succinic acid production in this non-model host. In Chapter 3, I built on our previous work in I. orientalis SD108 and introduced CRISPR-COPIES, a COmputational Pipeline for the Identification of CRISPR/Cas-facilitated intEgration Sites. I implemented ScaNN, a state-of-the-art model on the embedding-based nearest neighbor search for fast and accurate off-target search. Additionally, I employed various design rules and added several on-target models to select gRNAs with higher editing efficiencies. Next, I integrated the Database of Essential Genes for locating stable integration sites. I optimized the pipeline for speed and demonstrated its capability to rapidly discover genome-wide intergenic sites for most bacterial and fungal genomes within minutes. As proof of concept, I utilized CRISPR-COPIES to identify and characterized neutral integration sites in three diverse species: Saccharomyces cerevisiae, Cupriavidus necator, and HEK293T cells. Furthermore, I developed a user-friendly web interface for CRISPR-COPIES (https://biofoundry.web.illinois.edu/copies/). In Chapter 4, I employed the CRISPR-COPIES pipeline and multi-omics datasets to characterize integration sites in Sulfolobus islandicus M.16.4. Integrating genomics, transcriptomics, and chromatin datasets, I prioritized neutral integration sites for experimental validation. Collaborating with Dr. Changyi Zhang at the Carl R. Woese Institute of Genomic Biology (IGB), we employed the β-galactosidase assay to assess the sites for robust gene expression. Lastly, I outlined ongoing efforts in expressing thermostable terpene synthases and endogenous GDGT (glycerol dibiphytanyl glycerol tetraether) ring synthases for metabolic engineering. Traditional metabolic engineering studies express pathway enzymes in the cytoplasm. However, certain pathways benefit from subcellular localization, necessitating localization tags for compartmentalization. Despite this, characterization of mitochondrial targeting tags remains limited. Therefore, in Chapter 5, I addressed this gap by exploiting Variational Autoencoders (VAE), an unsupervised deep learning framework to design novel mitochondrial targeting sequences (MTS) and addressed this limitation. In silico analysis revealed that a high fraction of generated peptides are functional and possess features important for mitochondrial targeting. Additionally, I devised a sampling scheme to account for biases in interaction with import machinery and characterized artificial MTSs in four eukaryotic organisms. These functional sequences displayed significant diversity, sharing less than 60% sequence identity with MTSs in the UniProt database. Moreover, I trained a separate VAE and employed latent interpolation to design dual targeting sequences capable of targeting both mitochondria and chloroplasts, shedding light on their evolutionary origins. As proof-of-concept, I demonstrated the application of these artificial MTSs in improving HEM1 delivery. Finally, I directed my focus toward designing genetic components to regulate gene expression. Recent advancements in machine learning (ML) have empowered our ability to map genotype to phenotype and design genetic components with desired activity. However, most of these studies are primarily restricted to model organisms. In Chapter 6, I introduced ML-GTF (Machine Learning-guided Genetic Toolbox for Fungi), a computational platform that leverages large language models (LLMs), ML, and genetic algorithms for designing genetic toolkits. By harnessing signatures from RNA-seq data, this methodology enabled the creation of tailored genetic components, that meet the specific functional requirements of gene expression. The models exhibit good predictive accuracy, explaining 50-60% of the variance in mRNA levels. I conducted comprehensive studies for multiple fungal species, including in silico promoter design, codon optimization, and validation of previously characterized promoters in a non-model yeast, I. orientalis SD108. Moreover, I developed a streamlined, user-friendly pipeline for the rapid and efficient customization of genetic toolkits applicable to 1,500 fungal genomes spanning 806 species from the Ensembl Fungi database, requiring only RNA-Seq data as input. In summary, this dissertation describes my five-year journey of implementing bioinformatics and machine learning models to design enhanced synthetic biology toolkits, offering diverse applications in biotechnology and medicine.
- Graduation Semester
- 2024-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/125766
- Copyright and License Information
- Copyright 2024 Aashutosh Boob
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…