Efficient code-specific speculative decoding: Enhancing predictive accuracy and execution speed

Wang, Yuhan

Efficient code-specific speculative decoding: Enhancing predictive accuracy and execution speed

Wang, Yuhan

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/127359

Description

Title

Efficient code-specific speculative decoding: Enhancing predictive accuracy and execution speed

Author(s)

Wang, Yuhan

Issue Date

2024-12-09

Director of Research (if dissertation) or Advisor (if thesis)

Zhang, Lingming

Department of Study

Siebel School Comp & Data Sci

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Llm
Speculative Decoding

Language

eng

Abstract

Large Language Models (LLMs) have gained prominence due to their remarkable performance across diverse domains, providing high-quality and contextually relevant outputs that can be easily adapted to various applications. The success of these models has captured researchers’ attention, particularly regarding to the trade-offs between computational cost and performance: while larger LLMs yield superior outputs compared to their smaller counterparts, they also require significantly more computational and memory resources. This resource-intensive nature poses challenges in real-world applications, as scaling these models requires extensive infrastructure that may not be accessible or sustainable for all users and applications. To address these limitations, recent research has proposed Speculative Decoding—an approach that takes advantage of the efficiency of smaller models while preserving the performance benefits of larger ones. Speculative Decoding is based on the observation that many complex tasks can be decomposed into simpler subtasks, which both large and small models can handle with similar accuracy. By using a smaller model to perform preliminary inferences and selectively involving the larger model for more complex instances, this method significantly reduces resource consumption without sacrificing performance quality. In the context of programming language generation, however, unique structural and syntactic characteristics can offer additional opportunities for optimization. Unlike natural language, programming languages adhere to rigid syntactic rules and patterns, such as function and variable declarations, that provide context early in the generation process. This thesis investigates the potential for leveraging these syntactic structures within Speculative Decoding, aiming to further streamline the generation process for code. By integrating structured syntax with speculative execution techniques, this research seeks to advance computational efficiency in code generation, presenting a path toward resource-efficient LLMs that retain high performance in programming-specific tasks.

Graduation Semester

2024-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/127359

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Efficient code-specific speculative decoding: Enhancing predictive accuracy and execution speed

Wang, Yuhan

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In