Withdraw
Loading…
Toward AI-augmented data analysis: challenges and opportunities
Hu, Chuxuan
This item's files can only be accessed by the System Administrators group.
Permalink
https://hdl.handle.net/2142/132791
Description
- Title
- Toward AI-augmented data analysis: challenges and opportunities
- Author(s)
- Hu, Chuxuan
- Issue Date
- 2025-12-03
- Director of Research (if dissertation) or Advisor (if thesis)
- Kang, Daniel
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Databases
- Artificial Intelligence
- Abstract
- Real-world data analysis remains challenging for many users, especially domain experts, because it involves heterogeneous data formats, complex multi-step processing pipelines, and deep technical expertise. Although recent advances in large language models have motivated systems for natural language to SQL translation, semantic query execution, and agentic data retrieval, these systems remain limited to simple analytical tasks over standard data modalities. This thesis systematically investigates the limitations of AI-assisted data analysis along two critical dimensions: (1) data complexity and (2) analytic complexity. Specifically, it evaluates how well current AI systems handle data in complex, irregular forms and how reliably they can execute analytical workflows that move beyond straightforward SQL query translation. For data complexity, we introduce Chart2CSV, a benchmark of 812 real-world scientific charts paired with expert-validated ground-truth tables, and show that state-of-the-art vision language models misinterpret nearly half of the data points. For analytic complexity, we introduce REPRO-Bench, a benchmark of 112 social science reproducibility tasks, and demonstrate that existing AI agents achieve at most 21.4 percent accuracy. Even with our improved system, REPRO-Agent, performance remains far from adequate for practical use. Together, these results show that existing AI systems lack the perceptual, reasoning, and multi-step planning capabilities necessary for reliable real-world data analysis, highlighting substantial open challenges for future research.
- Graduation Semester
- 2025-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/132791
- Copyright and License Information
- Copyright 2025 Chuxuan Hu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…