From objects to worlds: scalable learning of 3D assets
Huang, Zixuan
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/132639
Description
Title
From objects to worlds: scalable learning of 3D assets
Author(s)
Huang, Zixuan
Issue Date
2025-11-13
Director of Research (if dissertation) or Advisor (if thesis)
Rehg, James M.
Doctoral Committee Chair(s)
Rehg, James M.
Committee Member(s)
Schwing, Alexander
Wang, Shenlong
Wu, Jiajun
Vedaldi, Andrea
Department of Study
Siebel School Comp & Data Sci
Discipline
Computer Science
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
3D Generation
3D Reconstruction
Video Generation
World Models
Abstract
Learning to reconstruct and generate the 3D world is a fundamental research problem in computer vision, with critical applications across diverse domains. However, the development of robust 3D generation and reconstruction systems is hindered by the scarcity of high-quality 3D data. This thesis aims to address this scaling challenge along several dimensions. First, we introduce ShapeClipper, which leverages semantic consistency from unlabeled 2D images to learn 3D shape reconstruction models. This enables scalable 3D learning from only single-view images, without any 3D annotations. Second, we present PointInfinity, a resolution-invariant point diffusion model for learning continuous 3D surfaces from point clouds. PointInfinity facilitates 3D learning using noisy point clouds derived from object-centric videos. Third, we introduce ZeroShape and re-examine the classical regression-based 3D reconstruction approach. We show it outperforms diffusion methods in accuracy, as well as computational and data efficiency. Finally, we explore the feasibility of learning 3D from in-the-wild videos without any 3D prior or data. As an initial yet solid step, we evaluate the 3D awareness of recent video foundation models, and find that state-of-the-art video generative models already possess strong 3D understanding. Together, this thesis makes significant advancements in scalable learning of 3D, providing practical solutions for reconstruction and generating 3D objects and worlds under limited high-quality 3D data.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.