Improving problem-solving capabilities of language model: data, architecture and algorithms

Wang, Ziqi

Improving problem-solving capabilities of language model: data, architecture and algorithms

Wang, Ziqi

This item's files can only be accessed by the System Administrators group.

Permalink

https://hdl.handle.net/2142/132794

Description

Title

Improving problem-solving capabilities of language model: data, architecture and algorithms

Author(s)

Wang, Ziqi

Issue Date

2025-12-04

Director of Research (if dissertation) or Advisor (if thesis)

Ji, Heng

Doctoral Committee Chair(s)

Ji, Heng
Zhang, Tong

Committee Member(s)

Peng, Hao
Hou, Le

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Artificial Intelligence
Language Model
Knowledge Distillation
Reinforcement Learning
Transformers
Test-Time Training

Language

eng

Abstract

Artificial intelligence (AI), particularly large language models (LLMs), has exhibited formidable problem-solving abilities across a myriad of domains. These range from constrained arenas, such as sentiment analysis, to expansive fields including coding and mathematical reasoning. Furthermore, LLMs display potential in complex scientific disciplines, encompassing Medicine, Biology, and Physics. The pressing demand for AI to expedite advancements across these domains necessitates a concentrated effort to enhance the problem-solving capabilities of LLMs. In this thesis, I elucidate the current challenges associated with improving these capabilities in language models, focusing on three critical areas: data, architecture, and algorithms. Subsequently, I present four significant contributions that address these challenges from distinct perspectives. First, I introduce a novel data augmentation methodology that performs mix-up operations within the language embedding layer of varying inputs, followed by the application of projection techniques to generate new textual inputs. This augmentation significantly elevates the performance of knowledge distillation from teacher models, consequently enhancing the capabilities of student models in addressing closed-domain challenges, specifically exemplified by the General Language Understanding Evaluation (GLUE) benchmark. Next, I shift my focus to the Transformer architecture and identify the factors contributing to position bias, a detrimental effect that impedes the reasoning capabilities of models. By eliminating position bias through the implementation of bidirectional attention mechanisms and position re-assignment strategies, I demonstrate that models achieve superior performance on downstream tasks, including applications where LLMs operate as evaluators. In terms of algorithmic advancements, I developed a self-improvement reinforcement learning algorithm designed to incentivize models to produce enhanced responses by iteratively learning from their prior outputs. Central to this algorithm is the modeling of the reward gap between different responses, which facilitates the generation of superior responses in comparison to previous iterations. Finally, I propose an enhanced inference-time algorithm, which incorporates test-time training processes, aimed at bolstering robustness to varying hyperparameter selections. This work spans a comprehensive range of considerations, including optimizer choices, regularization techniques, and the tuning of parameter selections. In conclusion, this thesis posits that the future trajectory of LLM development hinges on advancing reasoning capabilities through reinforcement learning, with a pronounced emphasis on self-correction and self-improvement mechanisms.

Graduation Semester

2025-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/132794

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Improving problem-solving capabilities of language model: data, architecture and algorithms

Wang, Ziqi

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In