Withdraw
Loading…
Improving diffusion models for enhanced accuracy, control, and realism in virtual try-on
Zhang, Jeffrey
Loading…
Permalink
https://hdl.handle.net/2142/125788
Description
- Title
- Improving diffusion models for enhanced accuracy, control, and realism in virtual try-on
- Author(s)
- Zhang, Jeffrey
- Issue Date
- 2024-07-08
- Director of Research (if dissertation) or Advisor (if thesis)
- Forsyth, David Alexander
- Doctoral Committee Chair(s)
- Forsyth, David Alexander
- Committee Member(s)
- Lazebnik, Svetlana
- Schwing, Alexander Gerhard
- Berg, Tamara Lee
- Department of Study
- Siebel Computing &DataScience
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Diffusion
- Virtual Try-On
- Controllability
- Computer Vision
- Fashion
- Abstract
- This thesis advances virtual try-on (VTON) methods, which generate images of people wearing specific garments. We prioritize garment accuracy, which is essential for accurately representing products and ensuring a reliable virtual shopping experience. Traditionally, generative adversarial networks (GANs) and warping methods have been the best means to achieve this, and we present several production-ready VTON contributions using these techniques. However, these methods often fall short in image quality compared to newer diffusion models. Diffusion methods, while providing improved image quality, frequently fail to maintain precise garment details (changing the product) and lack the ability to style garments. This thesis merges diffusion and warping techniques to improve both accuracy and image quality, while tackling other VTON challenges, such as control of garment styles, support for a wide range of people, and fast inference speed. We address challenges in standard diffusion formulations that compromise garment accuracy and control in VTON. First, diffusion models may exhibit background artifacts and shifts in image distributions during inference. To counter these issues, we introduce consistent initialization strategies to eliminate inconsistencies between training and testing procedures, resulting in consistent image distributions and artifact reductions. Second, we tackle compression errors from variational autoencoders (VAEs) that distort critical high-frequency garment details. Our automated process identifies and upsamples high-error regions during VAE processing, mitigating these errors. Finally, diffusion methods tend to hallucinate garment details, which leads to changing garment identities and unreliable generations. We introduce a novel diffusion-based VTON training scheme that uses carefully engineered control images to ensure accurate garment details, high quality, and complete garment control. Our proposed VTON diffusion method has several key advantages due to its enhanced control. Firstly, it enables multi-garment try-on, a feature seen in a handful of prior works. Secondly, it supports fine-grain layering, styling, and shoe try-ons. Few prior works support comprehensive styling and layering features, and far fewer have addressed shoe try-on. Finally, our method supports high-quality zoomed-in image generation without the need to train or run inference in higher resolution. This contribution is unique to our system, allowing for detailed close-up images of VTON. Both qualitative results and quantitative metrics, taken together with user studies, show that our method significantly outperforms others in image quality and in accurately preserving garment details (e.g. text, logos, textures, and patterns).
- Graduation Semester
- 2024-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/125788
- Copyright and License Information
- Copyright 2024 Jeffrey Zhang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…