Towards knowledgeable natural language generation

Ge, Yubin

Towards knowledgeable natural language generation

Ge, Yubin

Permalink

https://hdl.handle.net/2142/129469

Description

Title

Towards knowledgeable natural language generation

Author(s)

Ge, Yubin

Issue Date

2025-05-01

Director of Research (if dissertation) or Advisor (if thesis)

Diesner, Jana

Doctoral Committee Chair(s)

Diesner, Jana

Committee Member(s)

Ji, Heng
Hakkani-Tur, Dilek
Zhao, Han
Hazarika, Devamanyu

Department of Study

Siebel School Comp & Data Sci

Discipline

Computer Science

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Natural Language Generation
Knowledgeable Text Generation

Language

eng

Abstract

Natural Language Generation (NLG) is a fundamental task in Natural Language Processing that has been studied for decades. NLG enables the automatic generation of coherent and contextually appropriate text. The recent advent of large language models (LLMs) has further advanced the field, but challenges such as factual correctness, coherence, and adaptability to domain-specific contexts persist. This thesis explores knowledge-enhanced NLG by integrating external knowledge and processing training data to improve generation quality and domain adaptability. The first research direction investigates the incorporation of structured external knowledge into NLG models. A key application is the generation of citing sentences in academic writing, where traditional models struggle to capture the complex motivations behind citations. To address this, we propose BACO, a framework that integrates background knowledge from citation networks and textual content from citing and cited papers. BACO enhances citation intent alignment and improves the coherence and informativeness of generated citing sentences. Another application of including external knowledge focuses on follow-up question generation in conversational surveys, where knowledge-driven approaches improve the relevance, diversity, and clarity of questions. A new dataset is introduced for this task, along with a two-stage generative model and a novel set of Gricean-inspired evaluation metrics. The second research direction explores leveraging training data as an additional knowledge source to enhance NLG performance. In abstractive summarization, we analyze the impact of extractivity on model performance and introduce a framework that mitigates excessive copying through an integer linear programming-based optimization technique. Additionally, we develop an approach that integrates a copy mechanism into BART with an auxiliary loss function, improving summary faithfulness and generalization. In extractive summarization, we propose a fine-grained autoregressive method that utilizes semantic tuples derived from predicate-argument structures to improve content selection. This approach outperforms traditional sentence-level classification methods and addresses redundancy and fixed-length cutoff issues. By advancing methodologies in knowledge-aware NLG, this thesis contributes to the broader goal of enhancing automatic text generation with informed, accurate, and domain-sensitive content. The findings provide insights into improving model training, refining generation quality, and expanding the applicability of NLG models across diverse tasks.

Graduation Semester

2025-05

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/129469

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Towards knowledgeable natural language generation

Ge, Yubin

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In