Withdraw
Loading…
Efficient visual generation for community player and production service
Zhang, Chengsong
Loading…
Permalink
https://hdl.handle.net/2142/129257
Description
- Title
- Efficient visual generation for community player and production service
- Author(s)
- Zhang, Chengsong
- Issue Date
- 2025-04-28
- Director of Research (if dissertation) or Advisor (if thesis)
- Lai, Fan
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Image generation, ML system
- Abstract
- Morpheus improves the performance and throughput of serving diffusion transformer models (DiTs) for high-resolution image generation by strategically distributing computation across GPUs and adaptively skipping less salient regions. Unlike Large Language Models (LLMs), which are often I/O bound, image generation models like DiTs are predominantly compute-bound, especially at high resolutions. Furthermore, different regions within an image generation task exhibit varying levels of computational difficulty and semantic importance. To address these challenges, we introduce several technical contributions in Morpheus. First, we leverage a cross-attention-based analysis to dynamically identify and partition the image into salient and less salient regions. Second, we develop an algorithm that partitions the total computation into sub-tasks, considering both regional saliency and available compute resources. Third, we propose an adaptive skip algorithm that dynamically decides, on-the-fly during the diffusion process, when to skip computation for less salient regions without significant quality loss. Fourth, a centralized task scheduler dynamically assigns sub-tasks to workers, intelligently filling potential idle time arising from skipping with sub-tasks from other queued requests to boost overall throughput. Finally, we employ an asynchronous communication pattern for KV cache management and migration, crucial for enabling efficient parallel execution and hiding communication latency behind computation. Our evaluation of Morpheus serving the Flux model on 4x A100 40GB GPUs demonstrates a 2.1x speed-up compared to single-GPU execution, while maintaining minimal image quality degradation.
- Graduation Semester
- 2025-05
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/129257
- Copyright and License Information
- Copyright 2025 Chengsong Zhang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…