Withdraw
Loading…
An end-to-end benchmarking framework for retrieval-augmented generation systems
Xu, Yuan
Loading…
Permalink
https://hdl.handle.net/2142/132602
Description
- Title
- An end-to-end benchmarking framework for retrieval-augmented generation systems
- Author(s)
- Xu, Yuan
- Issue Date
- 2025-12-11
- Director of Research (if dissertation) or Advisor (if thesis)
- Huang, Jian
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Retrieval-Augmented Generation (RAG)
- Benchmarking framework
- AI system performance
- Abstract
- As Large Language Models (LLMs) transition from experimental prototypes to production-grade services, Retrieval-Augmented Generation (RAG) has emerged as the de facto paradigm for mitigating hallucinations and incorporating up-to-date knowledge. While the accuracy of RAG systems has been extensively studied, the system performance—specifically regarding throughput, latency, and memory efficiency at scale—remains largely unexplored. However, current RAG benchmarking efforts are predominantly accuracy-centric, focusing on metrics like precision and recall while neglecting the implications of the underlying retrieval infrastructure and generation bottlenecks. Consequently, developers face significant challenges in navigating the complex trade-offs between vector database configurations, retrieval strategies, and generative model parameters. This thesis presents a RAG-based AI system benchmarking framework (RASB) for characterizing the system performance of RAG pipelines. To enable a holistic evaluation, RASB decouples the RAG workflow into modular components—embedding, indexing, retrieval, and generation—allowing for fine-grained analysis of each stage. We rethink the evaluation methodology by shifting the focus from pure answer quality to system efficiency, exploring multiple dimensions such as varying batch sizes, vector database index types, and embedding dimensions. RASB provides a testbed that supports modular RAG pipelines with major vector databases and LLM backends, automating the collection of performance metrics that include end-to-end throughput, GPU memory consumption, and context recall. To evaluate diverse usage scenarios, RASB integrates a configurable workload generator that drives experiments using both real-world and synthetic datasets. We demonstrate RASB’s capability through a comprehensive set of experiments conducted on popular Vector Databases and LLM backends.
- Graduation Semester
- 2025-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/132602
- Copyright and License Information
- Copyright 2025 Yuan Xu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…