Toward AI-augmented data analysis: challenges and opportunities
Hu, Chuxuan
This item's files can only be accessed by the System Administrators group.
Permalink
https://hdl.handle.net/2142/132791
Description
Title
Toward AI-augmented data analysis: challenges and opportunities
Author(s)
Hu, Chuxuan
Issue Date
2025-12-03
Director of Research (if dissertation) or Advisor (if thesis)
Kang, Daniel
Department of Study
Siebel School Comp & Data Sci
Discipline
Computer Science
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Databases
Artificial Intelligence
Language
eng
Abstract
Real-world data analysis remains challenging for many users, especially domain experts, because it involves heterogeneous data formats, complex multi-step processing pipelines, and deep technical expertise. Although recent advances in large language models have motivated systems for natural language to SQL translation, semantic query execution, and agentic data retrieval, these systems remain limited to simple analytical tasks over standard data modalities.
This thesis systematically investigates the limitations of AI-assisted data analysis along two critical dimensions: (1) data complexity and (2) analytic complexity. Specifically, it evaluates how well current AI systems handle data in complex, irregular forms and how reliably they can execute analytical workflows that move beyond straightforward SQL query translation. For data complexity, we introduce Chart2CSV, a benchmark of 812 real-world scientific charts paired with expert-validated ground-truth tables, and show that state-of-the-art vision language models misinterpret nearly half of the data points. For analytic complexity, we introduce REPRO-Bench, a benchmark of 112 social science reproducibility tasks, and demonstrate that existing AI agents achieve at most 21.4 percent accuracy. Even with our improved system, REPRO-Agent, performance remains far from adequate for practical use.
Together, these results show that existing AI systems lack the perceptual, reasoning, and multi-step planning capabilities necessary for reliable real-world data analysis, highlighting substantial open challenges for future research.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.