Withdraw
Loading…
Towards knowledgeable natural language generation
Ge, Yubin
Loading…
Permalink
https://hdl.handle.net/2142/129469
Description
- Title
- Towards knowledgeable natural language generation
- Author(s)
- Ge, Yubin
- Issue Date
- 2025-05-01
- Director of Research (if dissertation) or Advisor (if thesis)
- Diesner, Jana
- Doctoral Committee Chair(s)
- Diesner, Jana
- Committee Member(s)
- Ji, Heng
- Hakkani-Tur, Dilek
- Zhao, Han
- Hazarika, Devamanyu
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Natural Language Generation
- Knowledgeable Text Generation
- Language
- eng
- Abstract
- Natural Language Generation (NLG) is a fundamental task in Natural Language Processing that has been studied for decades. NLG enables the automatic generation of coherent and contextually appropriate text. The recent advent of large language models (LLMs) has further advanced the field, but challenges such as factual correctness, coherence, and adaptability to domain-specific contexts persist. This thesis explores knowledge-enhanced NLG by integrating external knowledge and processing training data to improve generation quality and domain adaptability. The first research direction investigates the incorporation of structured external knowledge into NLG models. A key application is the generation of citing sentences in academic writing, where traditional models struggle to capture the complex motivations behind citations. To address this, we propose BACO, a framework that integrates background knowledge from citation networks and textual content from citing and cited papers. BACO enhances citation intent alignment and improves the coherence and informativeness of generated citing sentences. Another application of including external knowledge focuses on follow-up question generation in conversational surveys, where knowledge-driven approaches improve the relevance, diversity, and clarity of questions. A new dataset is introduced for this task, along with a two-stage generative model and a novel set of Gricean-inspired evaluation metrics. The second research direction explores leveraging training data as an additional knowledge source to enhance NLG performance. In abstractive summarization, we analyze the impact of extractivity on model performance and introduce a framework that mitigates excessive copying through an integer linear programming-based optimization technique. Additionally, we develop an approach that integrates a copy mechanism into BART with an auxiliary loss function, improving summary faithfulness and generalization. In extractive summarization, we propose a fine-grained autoregressive method that utilizes semantic tuples derived from predicate-argument structures to improve content selection. This approach outperforms traditional sentence-level classification methods and addresses redundancy and fixed-length cutoff issues. By advancing methodologies in knowledge-aware NLG, this thesis contributes to the broader goal of enhancing automatic text generation with informed, accurate, and domain-sensitive content. The findings provide insights into improving model training, refining generation quality, and expanding the applicability of NLG models across diverse tasks.
- Graduation Semester
- 2025-05
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/129469
- Copyright and License Information
- Copyright 2025 Yubin Ge
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Siebel School of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…