Withdraw
Loading…
Structure-enhanced text mining for science
Zhang, Yu
Content Files

Loading…
Download Files
Loading…
Download Counts (All Files)
Loading…
Edit File
Loading…
Permalink
https://hdl.handle.net/2142/127174
Description
- Title
- Structure-enhanced text mining for science
- Author(s)
- Zhang, Yu
- Issue Date
- 2024-11-04
- Director of Research (if dissertation) or Advisor (if thesis)
- Han, Jiawei
- Doctoral Committee Chair(s)
- Han, Jiawei
- Committee Member(s)
- Abdelzaher, Tarek
- Tong, Hanghang
- Wang, Wei
- Shen, Zhihong
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Text Mining
- Natural Language Processing
- Artificial Intelligence for Science
- Abstract
- Language models pre-trained on large scientific text corpora (e.g., research papers, electronic health records, and encyclopedia articles) have achieved remarkable success in many scientific text mining tasks. Meanwhile, text is usually accompanied by various types of structural signals, such as paper metadata, concept ontologies, and citation networks, that can potentially benefit scientific literature understanding. To enhance the effectiveness of scientific text mining methods, my doctoral research focuses on teaching language models to exploit structural information for fundamental and advanced domain-specific applications in science, with an emphasis on understanding and augmenting scientific discovery. This thesis summarizes my endeavors spanning the following three key subtopics: 1. Structure-Aware Fine-Grained Scientific Paper Classification. Automatically indexing scientific papers in a multi-dimensional and multi-granularity topic space not only facilitates flexible bibliographic exploration and analysis but also benefits a wide range of scientific applications. I have developed effective and efficient approaches that perform multi-label paper classification in an extremely large and multi-faceted label space (e.g., with 10,000-100,000 fields-of-study) by utilizing paper metadata, label taxonomy, heterogeneous information networks, and structures within full-text articles. 2. Structure-Aware Scientific Topic Discovery. Mining textual and structural topic-indicative signals (e.g., keywords, entities, metadata, and their combinations) has crucial applications in scientific discovery, such as finding biomarker proteins for different disease categories. I have proposed performant approaches for extracting label-indicative keywords and structural information, which enables the retrieval and synthesis of pseudo-labeled training samples and significantly enriches supervision in zero-shot and weakly supervised text mining. 3. Structure-Enhanced Language Model Pre-training for Scientific Applications. Scientific pre-trained language models can work alongside humans throughout the scientific discovery process by detecting and explaining relevant literature, unveiling knowledge structures, and evaluating research outcomes. I have systematically explored how to integrate fundamental scientific text mining tasks (e.g., paper classification, citation prediction, and literature retrieval) into language model pre-training to facilitate sophisticated applications, such as patient-to-article matching and peer review assignment. These efforts collectively pave the way for intelligent structure-enhanced text mining frameworks that process, utilize, and analyze scientific text data to accelerate science and innovation.
- Graduation Semester
- 2024-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/127174
- Copyright and License Information
- Copyright 2024 Yu Zhang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…