Withdraw
Loading…
Optimization opportunities for various heterogeneous pipelines
Patel, Krut Sachindev
Loading…
Permalink
https://hdl.handle.net/2142/129340
Description
- Title
- Optimization opportunities for various heterogeneous pipelines
- Author(s)
- Patel, Krut Sachindev
- Issue Date
- 2025-05-08
- Director of Research (if dissertation) or Advisor (if thesis)
- Mendis, Charith
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Data Processing
- Machine Learning
- Heterogeneous Systems
- Abstract
- Data preprocessing is an important step in machine learning workloads, and has started to take up increasingly higher share of the total run times. Frameworks such as tf.data and DataJuicer have gained popularity because they provide simple abstractions to define and potentially parallelize the preprocessing pipelines. More recent works have explored further optimization opportunities, focusing on offloading the computation to separate devices, and reordering the operators to reduce data transfer costs. However, they require manual input from the users for determining the data dependencies between various operators. This thesis focuses on studying and uncovering optimization opportunities for preprocessing pipelines. Guided by detailed performance profiling, we investigate the exact conditions under which reordering can provide benefits. We also analyze the impact of jointly optimizing reordering and device placement of the operators for better performance. Additionally, we explore the possibility of using automated search of the possible optimizations by using the ML model itself as a search metric. Moreover, we investigate a data pipeline from a state of the art multimodal model and detail the novel features of its performance characteristics. Finally, we conclude with a description of the open problems that need to be tackled for building data preprocessing systems for modern machine learning workloads.
- Graduation Semester
- 2025-05
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/129340
- Copyright and License Information
- Copyright 2025 Krut Patel
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…