Withdraw
Loading…
Improving problem-solving capabilities of language model: data, architecture and algorithms
Wang, Ziqi
This item's files can only be accessed by the System Administrators group.
Permalink
https://hdl.handle.net/2142/132794
Description
- Title
- Improving problem-solving capabilities of language model: data, architecture and algorithms
- Author(s)
- Wang, Ziqi
- Issue Date
- 2025-12-04
- Director of Research (if dissertation) or Advisor (if thesis)
- Ji, Heng
- Doctoral Committee Chair(s)
- Ji, Heng
- Zhang, Tong
- Committee Member(s)
- Peng, Hao
- Hou, Le
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Artificial Intelligence
- Language Model
- Knowledge Distillation
- Reinforcement Learning
- Transformers
- Test-Time Training
- Abstract
- Artificial intelligence (AI), particularly large language models (LLMs), has exhibited formidable problem-solving abilities across a myriad of domains. These range from constrained arenas, such as sentiment analysis, to expansive fields including coding and mathematical reasoning. Furthermore, LLMs display potential in complex scientific disciplines, encompassing Medicine, Biology, and Physics. The pressing demand for AI to expedite advancements across these domains necessitates a concentrated effort to enhance the problem-solving capabilities of LLMs. In this thesis, I elucidate the current challenges associated with improving these capabilities in language models, focusing on three critical areas: data, architecture, and algorithms. Subsequently, I present four significant contributions that address these challenges from distinct perspectives. First, I introduce a novel data augmentation methodology that performs mix-up operations within the language embedding layer of varying inputs, followed by the application of projection techniques to generate new textual inputs. This augmentation significantly elevates the performance of knowledge distillation from teacher models, consequently enhancing the capabilities of student models in addressing closed-domain challenges, specifically exemplified by the General Language Understanding Evaluation (GLUE) benchmark. Next, I shift my focus to the Transformer architecture and identify the factors contributing to position bias, a detrimental effect that impedes the reasoning capabilities of models. By eliminating position bias through the implementation of bidirectional attention mechanisms and position re-assignment strategies, I demonstrate that models achieve superior performance on downstream tasks, including applications where LLMs operate as evaluators. In terms of algorithmic advancements, I developed a self-improvement reinforcement learning algorithm designed to incentivize models to produce enhanced responses by iteratively learning from their prior outputs. Central to this algorithm is the modeling of the reward gap between different responses, which facilitates the generation of superior responses in comparison to previous iterations. Finally, I propose an enhanced inference-time algorithm, which incorporates test-time training processes, aimed at bolstering robustness to varying hyperparameter selections. This work spans a comprehensive range of considerations, including optimizer choices, regularization techniques, and the tuning of parameter selections. In conclusion, this thesis posits that the future trajectory of LLM development hinges on advancing reasoning capabilities through reinforcement learning, with a pronounced emphasis on self-correction and self-improvement mechanisms.
- Graduation Semester
- 2025-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/132794
- Copyright and License Information
- Copyright 2025 Ziqi Wang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…