Withdraw
Loading…
Hands in action: from 4D reconstruction to animation and robotics
Aditya Prakash, -
Loading…
Permalink
https://hdl.handle.net/2142/132522
Description
- Title
- Hands in action: from 4D reconstruction to animation and robotics
- Author(s)
- Aditya Prakash, -
- Issue Date
- 2025-11-24
- Director of Research (if dissertation) or Advisor (if thesis)
- Gupta, Saurabh
- Doctoral Committee Chair(s)
- Gupta, Saurabh
- Committee Member(s)
- Forsyth, David
- Lazebnik, Svetlana
- Damen, Dima
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- computer vision
- robotics
- Abstract
- Hands are central to human interactions in the real world, enabling us to perform several tasks with different objects on a daily basis. Imagine if we can develop AI agents, either digital (in the form of animation sequences) or physical (such as real-world robots), that allow us to interact with the environment in the same way as our hands. Digital agents can enable coaching applications using mixed reality devices by overlaying animations with the objects in the 3D space. For example, they can teach us how to move our hands & fingers, to switch between notes while playing a piano or to screw a bolt when repairing a machine. We can even have household robots to perform everyday tasks like loading the dishwasher, doing laundry, folding clothes, organizing shelves or cooking. To develop such AI agents, it is crucial to study different aspects of hand-object interactions, including the 3D structure of hands, objects, contact regions, and motion sequences. A central challenge in training these AI agents is acquiring 3D data to train machine learning models for 3D prediction. Unlike 2D labels that can be obtained through human labelers, collecting 3D annotations requires extensive lab setups. These setups often constrain the interactions depending on the capture settings, thereby hindering the performance of the models (trained on lab data) when transferred to everyday scenarios. Fortunately, there is an abundance of egocentric videos showing hand-object interactions in the wild with a large diversity of objects and actions. However, unlike 2D annotations, it is not possible to obtain 3D labels for these videos through human labelers. This limits the utility of these videos for training learning-based methods for 3D prediction. In this work, we propose methods for 4D reconstruction of hands, objects and contacts from large-scale everyday videos, extending beyond controlled lab settings. This involves two main insights: (1) 2D information, such as segmentation masks and keypoints, can be estimated effectively from videos using large vision models, (2) 3D shape priors can be extracted from lab & synthetic sources and combined with 2D masks & keypoints for 3D reconstruction. Using the data generated from these reconstruction models, we develop techniques for 4D motion forecasting (for generating animations) and robot learning from human videos. Overall, this dissertation lays the groundwork for how to scale up learning-based methods for 4D reconstruction of hand-object interactions from everyday videos for applications in animation and robotics. We hope this serves as a stepping-stone towards developing both digital and physical AI agents to perform tasks similar to human hands.
- Graduation Semester
- 2025-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/132522
- Copyright and License Information
- Copyright 2025 - Aditya Prakash
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…