Uni4D: Unifying visual foundation models for 4D modeling from a single video
Yao, David Yifan
Loading…
Permalink
https://hdl.handle.net/2142/129198
Description
Title
Uni4D: Unifying visual foundation models for 4D modeling from a single video
Author(s)
Yao, David Yifan
Issue Date
2025-04-15
Director of Research (if dissertation) or Advisor (if thesis)
Wang, Shenlong
Department of Study
Siebel School Comp & Data Sci
Discipline
Computer Science
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Computer Vision
Machine Learning
Structure from Motion
4D Reconstruction
Dynamic Modeling
Video Depth Estimation
Pose Estimation
Language
eng
Abstract
This paper presents a unified approach to understanding dynamic scenes from casual videos. Large pretrained vision foundation models, such as vision-language, video depth prediction, motion tracking, and segmentation models, offer promising capabilities. However, training a single model for comprehensive 4D understanding remains challenging. We introduce Uni4D, a multi-stage optimization framework that harnesses multiple pretrained models to advance dynamic 3D modeling, including static/dynamic reconstruction, camera pose estimation, and dense 3D motion tracking. Our results show state-of-the-art performance in dynamic 4D modeling with superior visual quality. Notably, Uni4D requires no retraining or fine-tuning, highlighting the effectiveness of repurposing visual foundation models for 4D understanding.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.