Multiple users have reported problems submitting materials to IDEALS. We are aware of the issue and are working to correct it ASAP. We appreciate your patience!
University of Illinois Urbana-Champaign
Withdraw
Loading…
Efficient code-specific speculative decoding: Enhancing predictive accuracy and execution speed
Wang, Yuhan
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/127359
Description
Title
Efficient code-specific speculative decoding: Enhancing predictive accuracy and execution speed
Author(s)
Wang, Yuhan
Issue Date
2024-12-09
Director of Research (if dissertation) or Advisor (if thesis)
Zhang, Lingming
Department of Study
Siebel School Comp & Data Sci
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
LLM
speculative decoding
Abstract
Large Language Models (LLMs) have gained prominence due to their remarkable performance across diverse domains, providing high-quality and contextually relevant outputs that can be easily adapted to various applications. The success of these models has captured researchers’ attention, particularly regarding to the trade-offs between computational cost and performance: while larger LLMs yield superior outputs compared to their smaller counterparts, they also require significantly more computational and memory resources. This resource-intensive nature poses challenges in real-world applications, as scaling these models requires extensive infrastructure that may not be accessible or sustainable for all users and applications.
To address these limitations, recent research has proposed Speculative Decoding—an approach that takes advantage of the efficiency of smaller models while preserving the performance benefits of larger ones. Speculative Decoding is based on the observation that many complex tasks can be decomposed into simpler subtasks, which both large and small models can handle with similar accuracy. By using a smaller model to perform preliminary inferences and selectively involving the larger model for more complex instances, this method significantly reduces resource consumption without sacrificing performance quality.
In the context of programming language generation, however, unique structural and syntactic characteristics can offer additional opportunities for optimization. Unlike natural language, programming languages adhere to rigid syntactic rules and patterns, such as function and variable declarations, that provide context early in the generation process. This thesis investigates the potential for leveraging these syntactic structures within Speculative Decoding, aiming to further streamline the generation process for code. By integrating structured syntax with speculative execution techniques, this research seeks to advance computational efficiency in code generation, presenting a path toward resource-efficient LLMs that retain high performance in programming-specific tasks.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.