Withdraw
Loading…
Enhancing automated debugging for java programs through data-flow analysis and automated program repair
Ouyang, Yicheng
Loading…
Permalink
https://hdl.handle.net/2142/129927
Description
- Title
- Enhancing automated debugging for java programs through data-flow analysis and automated program repair
- Author(s)
- Ouyang, Yicheng
- Issue Date
- 2025-07-10
- Director of Research (if dissertation) or Advisor (if thesis)
- Zhang, Lingming
- Doctoral Committee Chair(s)
- Zhang, Lingming
- Committee Member(s)
- Marinov, Darko
- Jabbarvand, Reyhaneh
- Yang, Wei
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Automated Debugging
- Data Flow Analysis
- Taint Analysis
- Automated Program Repair
- Patch Correctness Checking
- Large Language Models
- Abstract
- As software systems become increasingly complex and ubiquitous, automated debugging techniques are crucial for maintaining software quality and developer productivity. Developers constantly encounter bugs in their programs and rely on various debugging approaches to identify, understand, and fix these issues. However, modern software environments present three critical challenges that hinder effective automated debugging for Java programs. First, existing taint analysis techniques are inadequate for JVM-based microservice systems, resulting in a security assurance gap that leaves these systems vulnerable to severe threats. Existing static approaches suffer from high false positive rates and are unable to handle the dynamic behaviors prevalent in microservices, while existing dynamic approaches often lack portability and adaptability due to intrusive runtime modifications. Second, although automated program repair (APR) techniques have been widely studied, deficiencies in evaluation practices, such as inconsistent experimental settings, undermine the reliability and comprehensiveness of existing APR evaluations. Third, the absence of effective automated patch-correctness checking (PCC) techniques creates a bottleneck for practical APR adoption, as manual inspection remains necessary to assess patch correctness. Existing PCC methods are limited in accuracy, and while large language models (LLMs) show promise, the most effective strategies for leveraging them have yet to be fully explored. This dissertation addresses these critical challenges through three main contributions that enhance automated debugging for Java programs by advancing both data flow analysis and automated program repair techniques. To advance data flow analysis, this dissertation presents MirrorTaint, the first practical, non-intrusive dynamic taint analysis technique specifically designed for microservice systems on JVMs. By constructing mirrored data structures and replicating stack-based JVM instruction execution on the fly, MirrorTaint achieves superior compatibility and significantly higher recall (100%) compared to state-of-the-art techniques such as Phosphor (9.86%) and FlowDroid (28.17%). A case study at Ant Group, a billion-user global FinTech company, demonstrates that MirrorTaint can automatically find 98.97% of data relations with 100% precision, substantially outperforming developer-experience-based approaches, which only cover 84.02% of total data relations. To evaluate the effectiveness of automated program repair, this dissertation presents an extensive, multi-dimensional evaluation of twelve existing APR techniques (nine learning-based and three traditional) under uniform settings. Using both the widely studied Defects4J V2.0.0 benchmark and a newly constructed, large-scale mutation-based benchmark named MuBench, which contains 1,700 artificial bugs, the evaluation analyzes 1,814,652 generated patches across multiple dimensions. The comprehensive evaluation yields several insights into the effectiveness and limitations of current APR approaches. For example, it shows that LLM-based APR is less prone to overfitting and achieves the highest bug-fixing rates, while traditional and learning-based techniques offer complementary strengths in addressing different categories of bugs. To enhance automated patch-correctness checking, this dissertation presents comprehensive techniques for automated PCC using advanced LLMs. Through a systematic investigation of eight auxiliary information settings and seven prompting strategies, along with novel classification-based and ranking-based paradigms, the proposed LLM-based PCC techniques achieve substantial improvements over state-of-the-art PCC techniques. The optimized classification-based approach increases F1 scores by 0.1674 to 0.6621 compared to state-of-the-art methods, while the ranking-based approach delivers improvements of 66.3% in AVR and 57.1% in MAR for patch prioritization effectiveness, placing the first correct patch at an average position of 1.39. Overall, the work in this dissertation has advanced automated debugging capabilities for Java programs through three major contributions: (1) enabling dynamic taint analysis in modern JVM-based microservice systems through the non-intrusive MirrorTaint technique, which has been successfully deployed and validated in industrial settings; (2) conducting a comprehensive and standardized study of existing APR techniques, which offers insights into both traditional and learning-based approaches and provides actionable guidelines for enhancing APR effectiveness; and (3) establishing effective LLM-based paradigms for automated patch-correctness checking that have the potential to substantially reduce the manual effort required from developers to inspect automatically generated patches, making APR more practical and reliable in real-world software development.
- Graduation Semester
- 2025-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/129927
- Copyright and License Information
- Copyright 2025 Yicheng Ouyang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…