An end-to-end benchmarking framework for retrieval-augmented generation systems

Xu, Yuan

An end-to-end benchmarking framework for retrieval-augmented generation systems

Xu, Yuan

Permalink

https://hdl.handle.net/2142/132602

Description

Title

An end-to-end benchmarking framework for retrieval-augmented generation systems

Author(s)

Xu, Yuan

Issue Date

2025-12-11

Director of Research (if dissertation) or Advisor (if thesis)

Huang, Jian

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Retrieval-Augmented Generation (RAG)
Benchmarking framework
AI system performance

Language

eng

Abstract

As Large Language Models (LLMs) transition from experimental prototypes to production-grade services, Retrieval-Augmented Generation (RAG) has emerged as the de facto paradigm for mitigating hallucinations and incorporating up-to-date knowledge. While the accuracy of RAG systems has been extensively studied, the system performance—specifically regarding throughput, latency, and memory efficiency at scale—remains largely unexplored. However, current RAG benchmarking efforts are predominantly accuracy-centric, focusing on metrics like precision and recall while neglecting the implications of the underlying retrieval infrastructure and generation bottlenecks. Consequently, developers face significant challenges in navigating the complex trade-offs between vector database configurations, retrieval strategies, and generative model parameters. This thesis presents a RAG-based AI system benchmarking framework (RASB) for characterizing the system performance of RAG pipelines. To enable a holistic evaluation, RASB decouples the RAG workflow into modular components—embedding, indexing, retrieval, and generation—allowing for fine-grained analysis of each stage. We rethink the evaluation methodology by shifting the focus from pure answer quality to system efficiency, exploring multiple dimensions such as varying batch sizes, vector database index types, and embedding dimensions. RASB provides a testbed that supports modular RAG pipelines with major vector databases and LLM backends, automating the collection of performance metrics that include end-to-end throughput, GPU memory consumption, and context recall. To evaluate diverse usage scenarios, RASB integrates a configurable workload generator that drives experiments using both real-world and synthetic datasets. We demonstrate RASB’s capability through a comprehensive set of experiments conducted on popular Vector Databases and LLM backends.

Graduation Semester

2025-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/132602

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

An end-to-end benchmarking framework for retrieval-augmented generation systems

Xu, Yuan

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In