Withdraw
Loading…
QStore: quantization-aware compressed model storage
Shah, Raunak
Loading…
Permalink
https://hdl.handle.net/2142/129293
Description
- Title
- QStore: quantization-aware compressed model storage
- Author(s)
- Shah, Raunak
- Issue Date
- 2025-05-06
- Director of Research (if dissertation) or Advisor (if thesis)
- Park, Yongjoo
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Compression
- Quantization
- LLMs
- File Formats
- Storage
- Machine Learning
- AI
- Abstract
- Modern applications commonly leverage large, multi-modal foundation models. Such use cases often feature complex workflows that demand the storage and usage of models in multiple precisions. However, to make model development accessible to the average user, model providers opt to maintain separate files for each precision (e.g., INT8, BF16); Hence, naively handling model usage in multi-precision workflows (e.g., downloading and storing each precision separately) can incur prohibitive storage costs. We present QStore, a unified, lossless compression format for simultaneously storing a model in two (high and low) precisions. QStore’s encoding scheme stores a pair of different-precision models with even less storage cost versus storing the high-precision model alone: it compresses the low-precision model, then stores a novel representation of the ’conditional information’ present in the high-precision model but not in the low precision model. Then, for model usage, QStore allows direct access to the low precision model via decoding; if the high precision model is required, QStore recovers it losslessly by applying the stored conditional information onto the decoded low-precision model. We evaluate QStore for compressing multiple precisions of popular foundation models, and show that QStore reduces the overall storage footprint by up to 2.2× (45% of the original size) while enabling up to 1.7× and 1.8× faster model saving and loading versus existing approaches.
- Graduation Semester
- 2025-05
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/129293
- Copyright and License Information
- Copyright 2025 Raunak Shah
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…