Withdraw
Loading…
Integrating natural language with molecular structure
Edwards, Carl
Loading…
Permalink
https://hdl.handle.net/2142/132486
Description
- Title
- Integrating natural language with molecular structure
- Author(s)
- Edwards, Carl
- Issue Date
- 2025-12-04
- Director of Research (if dissertation) or Advisor (if thesis)
- Ji, Heng
- Doctoral Committee Chair(s)
- Ji, Heng
- Committee Member(s)
- Han, Jiawei
- Zhai, ChengXiang
- Burke, Martin D
- Cho, Kyunghyun
- Scalia, Gabriele
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- molecule-language multimodality
- natural language processing
- AI4Science
- NLP4Science
- molecule tokenization
- molecule generation
- LLM
- chemical language model
- drug synergy prediction
- AI for scientific discovery
- cross-modal learning
- molecule-text alignment
- chemical language models
- molecule captioning
- in-context molecular learning
- text-to-molecule generation
- text-guided molecule generation
- Abstract
- The world faces unprecedented challenges in the coming decades across climate change, healthcare, and food security, each requiring innovative scientific solutions that are scalable, adaptable, and cost-effective. Further, we need to develop these solutions quickly. Broadly speaking, chemistry can provide molecular solutions to many of these problems: breakthrough drugs (e.g., kinase inhibitors), materials (e.g., organic photovoltaics), and chemical processes. The extremely large search spaces in which these solutions exist make AI tools critical for finding them. Of particular note, multimodal models combining natural, human language with molecular structure are poised to be a critical tool for discovering these solutions. In this dissertation, we will focus on enabling natural language processing, and particularly natural language itself, to serve as a tool for discovering and accelerating molecular solutions to critical global challenges. One of the first questions that probably comes to mind is why we would want to integrate natural language with molecules. Succinctly, combining these types of information has the possibility to accelerate scientific discovery. As motivating scenarios, imagine a future where a doctor can receive a novel, patient-specific drug necessary to treat an ailment just by writing a few sentences describing the patient’s symptoms (also taking into account their genotype, phenotype, and medical history). Or, imagine a scientist tackling challenging problems by designing a molecule satisfying desired functions (e.g., antimalarial or a photovoltaic) rather than its structure or low level properties (e.g., solubility). Controlling molecules and drug design in such a high-level manner has potential to be hugely impactful, but it requires a method of abstract description; luckily, humans have already developed one: natural language. As this research direction is a relatively new undertaking, we will focus on the development of several new tasks which are critical for its development. These include molecule captioning, text-based molecule generation, and inverse-synergistic drug structure design via in-context learning. Further, we will focus on three keys advantages of natural language in molecule design: functionality, abstraction, and composition. We explore both general approaches and specific applications to kinase inhibitor discovery, molecule property prediction, and drug synergy prediction. Finally, we conclude by proposing a modular chemical language model which is both synthesis- and function- aware. In particular, this model integrates lessons learned during earlier chapters by proposing a flexible, inference-time chemical vocabulary which can be adapted to a wide-variety of synthesis platforms.
- Graduation Semester
- 2025-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/132486
- Copyright and License Information
- Copyright 2025 Carl Edwards
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…