Withdraw
Loading…
Accelerating queries for structured and unstructured data
Jin, Tengjun
Loading…
Permalink
https://hdl.handle.net/2142/124567
Description
- Title
- Accelerating queries for structured and unstructured data
- Author(s)
- Jin, Tengjun
- Issue Date
- 2024-04-30
- Director of Research (if dissertation) or Advisor (if thesis)
- Kang, Daniel
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Database Systems
- Machine Learning
- Approximate Query Processing
- Language
- eng
- Abstract
- Data analytics is important for making data-driven decisions. As data volumes expand, the efficiency and cost of executing queries become critical concerns for analysts. Traditionally, analytics systems have prioritized structured data. Approximate Query Processing (AQP) systems, which provide faster aggregation queries by delivering approximate results, have been developed to enhance efficiency. However, they are limited used in real-application due to compatibility issues with popular databases and restrictions on the types of queries they can handle. To overcome these limitations, we have designed an innovative AQP system that functions as middleware. This system uses online sampling techniques to accelerate aggregation queries and can meet user-specified error targets. With advancements in machine learning (ML), analysts are increasingly interested in analyzing unstructured data (videos, images, text, and audio) to extract semantic information. Current analytics systems typically integrate ML models through user-defined functions (UDFs). These UDFs can be difficult to optimize and require application users to write complex, nested table expressions. To address these challenges, we introduce a new data model, AIDM, enabling users to query ML model outputs as standard SQL tables, through virtual columns and virtual tables. We implement AIDM, as well as novel optimizations for accelerating both approximate and exact queries in AIDB. Our evaluations show that the AQP system can provide speedups of up to 87x and AIDB can reduce the number of ML model invocations by up to 98%.
- Graduation Semester
- 2024-05
- Type of Resource
- Text
- Handle URL
- https://hdl.handle.net/2142/124567
- Copyright and License Information
- Copyright 2024 Tengjun Jin
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…