Withdraw
Loading…
Modeling and editing 4D scenes by leveraging structural priors
Lyu, Jipeng
Loading…
Permalink
https://hdl.handle.net/2142/129184
Description
- Title
- Modeling and editing 4D scenes by leveraging structural priors
- Author(s)
- Lyu, Jipeng
- Issue Date
- 2025-04-22
- Director of Research (if dissertation) or Advisor (if thesis)
- Wang, Yuxiong
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- 4D Scene Understanding
- Geometric Structural Priors
- Semantic Video Editing
- 3D Gaussian Splatting
- Diffusion Models
- Abstract
- 4D scene understanding powers applications ranging from AR/VR and robotics to controllable video generation. The goal is to build representations that can faithfully model and manipulate real-world environments as they evolve over time—capturing both how scenes deform (modeling) and how they respond to high-level user instructions (editing). A popular approach is to represent scenes using compact geometric primitives such as 3D Gaussians, which enable efficient rendering and temporal consistency across frames. While recent advances in 3D reconstruction and video editing have shown promising results, many existing methods still overlook a key aspect: structure. Whether in the form of spatial rigidity or semantic hierarchy, structural priors are both abundant and underexplored. This thesis investigates how incorporating such priors—geometric and semantic—into 4D scene modeling and editing can enhance efficiency, controllability, and generalization. The first part of this thesis focuses on geometric structural priors in dynamic 3D modeling. Many dynamic scenes exhibit coherent change patterns: objects often deform in groups, move rigidly or semi-rigidly, or follow interpretable part-wise trajectories. Instead of modeling motion independently for each element, we propose a structural cascaded optimization framework that organizes 3D Gaussians into a coarse-to-fine hierarchy. This structure allows us to parameterize deformation using simple transformations—rotation, translation, and scaling—substantially accelerating optimization. It also enables dense point tracking and motion-based segmentation without requiring semantic labels. These results demonstrate the potential of structured representations for fast and interpretable 4D scene modeling. The second part explores semantic structural priors in video editing. User instructions often involve multiple entangled goals that are difficult to fulfill through a single transformation. To address this, we employ large language models (LLMs) to decompose complex prompts into interpretable semantic subgoals. Each subgoal defines an editing stage, executed within a training-free diffusion-based video editing framework. To accommodate varying subgoal complexity, we further prompt the LLM to estimate editing difficulty and adapt the interpolation schedule accordingly. This results in smoother transitions and robust edits, transforming the process into a semantically grounded and interpretable sequence. Together, these contributions highlight the value of structural reasoning in 4D scene understanding. By bridging geometric modeling and semantic editing, this thesis offers unified insights into building efficient, robust, and controllable 4D systems guided by structural priors.
- Graduation Semester
- 2025-05
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/129184
- Copyright and License Information
- Copyright 2025 Jipeng Lyu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…