Withdraw
Loading…
Analysis of Data Streaming Accelerator in Intel Sapphire Rapids Xeon Scalable Processors
Kuper, Reese
Loading…
Permalink
https://hdl.handle.net/2142/120387
Description
- Title
- Analysis of Data Streaming Accelerator in Intel Sapphire Rapids Xeon Scalable Processors
- Author(s)
- Kuper, Reese
- Issue Date
- 2023-05-03
- Director of Research (if dissertation) or Advisor (if thesis)
- Kim, Nam Sung
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Intel DSA
- On-Chip Accelerators
- SoC
- DMA
- Data Streaming
- Hardware Accelerators
- Abstract
- As the semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (Intel DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). Intel DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it becomes much more versatile by supporting a wider range of operations on streaming data such as CRC32 calculations, delta record creation/merging, and data integrity field (DIF) operations. Several architectural innovations have also been made to facilitate the practical use of Intel DSA, for example, shared virtual memory (SVM) and new x86 instructions for lock-free work descriptor submission and synchronization. This thesis sets out to introduce the latest features supported by Intel DSA, deep-dive into its versatility, and analyze its throughput and performance benefits through a comprehensive evaluation. Our analysis demonstrates that Intel DSA saves CPU cycles by 37.3\% and 71.3\% when synchronously offloading 1~KB memory copy operations with batch sizes of 1 and 4, respectively, compared to their software counterpart (i.e., memcpy() running on a core). This allows cores to leverage precious cycles for more complex and latency-sensitive tasks rather than for such simple but repetitive operations. By offloading the same operations asynchronously with batch sizes of 1 and 4, Intel DSA provides 2.3x and 6.4x higher throughput than the software counterpart, respectively. In addition to these inherent benefits, we also demonstrate that Intel DSA effectively avoids the pollution of performance-critical resources (i.e. on-chip caches), and thus eliminates performance interference with other co-running memory-intensive/latency-sensitive applications. Along with the analysis of its characteristics, we explore various use cases that can benefit from Intel DSA - DPDK-based VirtIO, SPDK-based NVMe-oF, cloud data caching services, and HPC/ML frameworks - as well as describing other potential use cases. Finally, we provide several guidelines that will help users to effectively use the Intel DSA accelerator device.
- Graduation Semester
- 2023-05
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/120387
- Copyright and License Information
- Copyright 2023 Reese Kuper
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…