Efficient visual generation for community player and production service

Zhang, Chengsong

Efficient visual generation for community player and production service

Zhang, Chengsong

Permalink

https://hdl.handle.net/2142/129257

Description

Title

Efficient visual generation for community player and production service

Author(s)

Zhang, Chengsong

Issue Date

2025-04-28

Director of Research (if dissertation) or Advisor (if thesis)

Lai, Fan

Department of Study

Siebel School Comp & Data Sci

Discipline

Computer Science

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Image generation
ML system

Language

eng

Abstract

Morpheus improves the performance and throughput of serving diffusion transformer models (DiTs) for high-resolution image generation by strategically distributing computation across GPUs and adaptively skipping less salient regions. Unlike Large Language Models (LLMs), which are often I/O bound, image generation models like DiTs are predominantly compute-bound, especially at high resolutions. Furthermore, different regions within an image generation task exhibit varying levels of computational difficulty and semantic importance. To address these challenges, we introduce several technical contributions in Morpheus. First, we leverage a cross-attention-based analysis to dynamically identify and partition the image into salient and less salient regions. Second, we develop an algorithm that partitions the total computation into sub-tasks, considering both regional saliency and available compute resources. Third, we propose an adaptive skip algorithm that dynamically decides, on-the-fly during the diffusion process, when to skip computation for less salient regions without significant quality loss. Fourth, a centralized task scheduler dynamically assigns sub-tasks to workers, intelligently filling potential idle time arising from skipping with sub-tasks from other queued requests to boost overall throughput. Finally, we employ an asynchronous communication pattern for KV cache management and migration, crucial for enabling efficient parallel execution and hiding communication latency behind computation. Our evaluation of Morpheus serving the Flux model on 4x A100 40GB GPUs demonstrates a 2.1x speed-up compared to single-GPU execution, while maintaining minimal image quality degradation.

Graduation Semester

2025-05

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/129257

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Efficient visual generation for community player and production service

Zhang, Chengsong

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In