Scalable foundation models

Chen, Yangyi

Scalable foundation models

Chen, Yangyi

Permalink

https://hdl.handle.net/2142/132558

Description

Title

Scalable foundation models

Author(s)

Chen, Yangyi

Issue Date

2025-12-03

Director of Research (if dissertation) or Advisor (if thesis)

Ji, Heng

Doctoral Committee Chair(s)

Ji, Heng

Committee Member(s)

Zhang, Tong
Peng, Hao
Yang, Zhengyuan
Ping, Wei

Department of Study

Siebel School Comp & Data Sci

Discipline

Computer Science

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

foundation models, scalable AI, multimodal

Language

eng

Abstract

The continual growth in computational resources and human annotators, driven by advances in hardware architecture and the increasing accessibility of crowdsourcing platforms, has created unprecedented opportunities for artificial intelligence (AI) models. However, without scalable AI solutions (e.g., training algorithms, model architectures), much of this additional compute and annotation may be underutilized or yield diminishing returns. Furthermore, as real-world applications demand increasingly sophisticated AI capabilities, scalable models offer a clear path to achieving higher levels of intelligence by taking full advantage of available computational and annotation resources. This makes scalability not just a technical consideration, but a fundamental requirement for advancing the field of AI in parallel with hardware developments. This dissertation investigates the fundamental trajectory toward scalable foundation models through three subsequent research milestones. (1) Predictable Scaling: We examine scaling laws that govern the development of foundation models, analyzing how models' capabilities correlate with computational resources. Our research establishes principle solutions to forecasting model behaviors and resource requirements across different scales, enabling scientific and reliable scaling of AI models. (2) Scalable Modeling: We explore model architectures and training recipes optimized for multimodal learning, demonstrating how these approaches can effectively utilize increasing computational resources and data to achieve continuous performance improvements. Our findings reveal architectural principles and training strategies that maintain efficiency at scale while avoiding common bottlenecks in previous modeling strategies. (3) Scalable Oversight: We study the scalable post-training approaches that enable continuous model improvement and alignment with human values even as model capabilities expand beyond human expertise. This research introduces novel techniques for scalable supervision that scale in parallel with model complexity and capability, ensuring the responsible advancement of AI models. In Chapter 1, we describe the key research problems and dive deeply into several key featured research in the following chapters. Predictable Scaling In Chapter 2, we study how to estimate the actual capabilities (i.e., downstream performance) in large language models (LLMs) via addressing the challenges of LLMs' emergent abilities. We focus on the pre-training loss as a more computation-efficient metric for performance estimation. We present FLP, a two-stage approach for performance prediction that consists of first estimating a function that maps computational resources (e.g., FLOPs) to the pre-training Loss using a series of sampling models, followed by mapping the pre-training loss to downstream task Performance after the critical "emergent phase". Scalable Modeling In Chapter 3, we present a scalable code-guided visual representation learning method and a single transformer architecture for scalable vision-language modeling. A single unified Transformer architecture can effectively addresses the scalability concerns in previous large vision-language models (LVLMs); however, its limited adoption in modern context likely stems from the absence of reliable training recipes that balance both modalities and ensure stable training for billion-scale models. We introduce the first open-source training recipe for developing unified LVLMs, using moderate academic resources (8 x A100 80GB GPUs). In addition, we revisit the next token prediction loss on vision-language pre-training, and argue that this can be a false proxy of the actual capabilities in LVLMs. We propose a new algorithm, ViStruct, to scale up vision-langugage pre-training. The results show that ViStruct scales better with more data and compute. Scalable Oversight In Chapter 4, we investigate a novel approach to AI supervision through learning from AI feedback. We introduce a scalable alignment framework that harnesses the strong capabilities of large language models (LLMs) to guide the development of LVLMs. Our framework advances beyond conventional numerical reward signals by leveraging natural language feedback as a primary mechanism for model optimization and refinement. This methodology enables the systematic refinement of model responses, promoting attributes of helpfulness, truthfulness, and safety and also enhance their capacity for sustained multi-turn interactions. Our approach demonstrates how advanced LLMs can serve as effective supervisors in the training pipeline, offering a scalable solution to the challenge of model alignment. In addition, we utilize the AI feedback to supervise the reasoning consistency of LVLMs. In our curated benchmark that targets the chain-of-thought (CoT) reasoning performance and consistency of LVLMs, the results show that supervising the reasoning process brings better reasoning capabilities in LVLMs.

Graduation Semester

2025-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/132558

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Scalable foundation models

Chen, Yangyi

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In