This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/129555
Description
Title
Inference-driven perceptual optimization
Author(s)
Xu, Xin
Issue Date
2025-04-22
Director of Research (if dissertation) or Advisor (if thesis)
Wang, Yuxiong
Department of Study
Siebel School Comp & Data Sci
Discipline
Computer Science
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Perception
Inference
Diffusion Models
Sparsity
Abstract
Perception is fundamental to modern computer vision. While most research has focused on improving datasets, model architectures, or training objectives, this thesis explores a complementary yet under-explored perspective: optimizing perception models through inference guidance. The core insight is that models often exhibit informative behaviors during inference, such as deviations from training assumptions or signals that reveal opportunities for further optimization. Although typically overlooked, these behaviors can be leveraged to improve model effectiveness and efficiency. To demonstrate the idea, this thesis presents two case studies, each targeting one of these dimensions.
The first part investigates Inference Guidance for Diffusion Models, aiming to improve model effectiveness by aligning training objectives with inference behavior. Diffusion models, increasingly adopted for perception tasks, generate samples through an iterative denoising process. However, this process often misaligns with the objectives of discriminative tasks. Inference-time analysis reveals two key issues: (1) denoising timesteps contribute unevenly to perception quality, and (2) a distribution shift between training and inference leads to performance degradation. To address these challenges, we propose ADDP (Aligning Diffusion Denoising with Perception), a framework that reweights the training objective based on timestep importance and introduces data augmentations to simulate inference-time denoising errors. Experiments on depth estimation and referring image segmentation tasks demonstrate that ADDP improves both perceptual alignment and overall performance.
The second part focuses on Inference Acceleration for Visual Encoders, aiming to enhance model efficiency by leveraging inference properties. Analysis of pre-trained vision transformers (ViTs) shows that sparsity naturally emerges in the attention patterns of deeper layers, enabling selective computation without sacrificing performance. Building on this observation, we introduce SVE (Sparse Vision Encoder), a framework that identifies target layers, restores performance through distillation, and reduces inference latency via sparsity prediction. Experiments across multiple vision encoders show that SVE achieves up to a 23% speedup while maintaining accuracy on classification and segmentation benchmarks.
Together, these studies advocate for inference-aware optimization as a promising paradigm for advancing perception models. They demonstrate that careful analysis of inference-time behavior can uncover new opportunities for performance gains beyond conventional training-time interventions.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.