Withdraw
Loading…
Natural language processing for supporting impact assessment of funded projects
Han, Kanyao
Loading…
Permalink
https://hdl.handle.net/2142/129870
Description
- Title
- Natural language processing for supporting impact assessment of funded projects
- Author(s)
- Han, Kanyao
- Issue Date
- 2025-07-15
- Director of Research (if dissertation) or Advisor (if thesis)
- Diesner, Jana
- Doctoral Committee Chair(s)
- Diesner, Jana
- Committee Member(s)
- Schneider, Jodi
- Kilicoglu, Halil
- Miller, Daniel C.
- Department of Study
- Information Sciences
- Discipline
- Information Sciences
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Natural Language Processing (NLP)
- Impact Assessment
- Text Mining
- Information Extraction
- Funder Name Disambiguation
- Funding Analysis
- Domain-Specific NLP
- Low-Resource NLP
- Abstract
- Funding from organizations like the U.S. National Science Foundation plays a crucial role in supporting researchers and practitioners in advancing scientific knowledge, promoting societal progress, and protecting the environment, among other goals. As a result, both organizations and researchers are keen to understand how such funding is distributed across various projects and disciplines, as well as the outcomes and impacts generated by these projects. A comprehensive analysis of diverse text-based data sources that document funding allocations, research outcomes, and broader impacts can help deepen this understanding. These data sources include project reports submitted to funders as well as outcomes published in research articles. However, annotating and analyzing text-based data, even at moderate volumes, can be time-consuming and costly. Researchers must process lengthy and large-scale datasets to identify meaningful information for analysis. This dissertation aims to leverage computational methods, particularly from the fields of Natural Language Processing (NLP) and Machine Learning (ML), to assist researchers and practitioners in managing text-based data more efficiently and effectively. By automating or semi-automating processes such as information extraction, data cleaning, and classification, this work seeks to reduce the workload associated with data processing and annotation. This dissertation explores how NLP and ML techniques can be developed and used to handle data from social and scientific research under three challenging conditions: (1) disorganized, complex, lengthy, or incomplete datasets; (2) limited availability of annotated data; and (3) the need for domain-specific analysis schemas. By addressing these challenges, this dissertation aims to develop innovative approaches to aid in the analysis of funding allocation and the assessment of the impact of funded projects, with three studies being presented. First, analyzing past funding allocations can offer valuable insights into funding patterns in previous research. However, such analyses are often hindered by inconsistent and ambiguous naming conventions for funding organizations in publication records. This dissertation proposes a framework for fine-tuning a model to disambiguate funder names. Second, categorizing project reports can provide valuable insights into how funding is allocated across different project themes. Despite the availability of various categorization methods that typically require large volumes of annotated data for model fine-tuning or training, little is known about how to build effective models when: (a) categorizing texts demands substantial domain expertise and/or detailed reading; (b) only a limited number of annotated documents are available for training; and (c) no relevant computational resources, such as effective pre-trained models, exist. This dissertation introduces and evaluates a categorization method that combines expert knowledge with computational models to develop domain-specific categorization models. Third, with funding agencies increasingly demanding evidence of the social impact of scientific research, impact assessment has become critical. However, challenges remain in categorizing research reports due to the absence of a comprehensive impact classification schema and standardized reporting formats across domains. This dissertation addresses these gaps by developing and evaluating a classification schema for assessing the impact of funded research projects across domains, assisted by NLP and ML techniques. This dissertation advances knowledge by (1) developing novel frameworks for cleaning, annotating, and extracting valuable information from publication records and project reports; (2) providing insights into funding allocation in scientific research and biodiversity conservation; and (3) enhancing the understanding of the impacts described by funded projects.
- Graduation Semester
- 2025-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/129870
- Copyright and License Information
- Copyright 2025 Kanyao Han
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…