Crafting unusual programs for fuzzing deep learning libraries

Yang, Shujing

Crafting unusual programs for fuzzing deep learning libraries

Yang, Shujing

Permalink

https://hdl.handle.net/2142/120389

Description

Title

Crafting unusual programs for fuzzing deep learning libraries

Author(s)

Yang, Shujing

Issue Date

2023-04-20

Director of Research (if dissertation) or Advisor (if thesis)

Zhang, Lingming

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Software Engineering
Language Models

Language

eng

Abstract

Deep Learning (DL) applications play a vital role in modern society. Bugs in DL libraries can significantly impact a wide range of downstream DL applications, making it crucial to develop effective testing techniques for these libraries. Generating valid input programs for fuzzing DL libraries is challenging, as they must adhere to both the syntax/semantics of the supported languages (e.g., Python) and the tensor/operator constraints required for constructing valid computational graphs. The recent TitanFuzz work has shown, for the first time, that modern Large Language Models (LLMs) can be directly employed to implic- itly learn all language and DL computation constraints to create valid programs for fuzzing DL libraries. However, LLMs tend to generate ordinary programs that follow patterns/to- kens similar to typical programs found in their vast training corpora (e.g., GitHub), whereas fuzzing favors unusual inputs that cover edge cases or are less likely to be manually produced. To address this challenge, we propose AtlasFuzz, the first technique to prime LLMs for synthesizing unusual programs to enhance fuzzing effectiveness. AtlasFuzz is based on the well-established hypothesis that historically bug-triggering programs may contain rare and valuable code elements crucial for bug discovery. While traditional techniques leveraging such historical information demand extensive human effort to design dedicated generators and ensure the syntactic/semantic validity of the generated programs, AtlasFuzz demon- strates that this process can be fully automated through the intrinsic capabilities of LLMs (including fine-tuning and in-context learning) and is generalizable and applicable to chal- lenging domains. Furthermore, AtlasFuzz also highlights the potential of directly utilizing the instruction-following capability of the recent ChatGPT for effective fuzzing. Our experimental study on two popular DL libraries (PyTorch and TensorFlow) reveals that AtlasFuzz is an effective fuzzer for DL libraries, detecting 18 bugs, including 10 already confirmed as previously unknown bugs.

Graduation Semester

2023-05

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/120389

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Crafting unusual programs for fuzzing deep learning libraries

Yang, Shujing

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In