Hands in action: from 4D reconstruction to animation and robotics

Aditya Prakash, -

Hands in action: from 4D reconstruction to animation and robotics

Aditya Prakash, -

Permalink

https://hdl.handle.net/2142/132522

Description

Title

Hands in action: from 4D reconstruction to animation and robotics

Author(s)

Aditya Prakash, -

Issue Date

2025-11-24

Director of Research (if dissertation) or Advisor (if thesis)

Gupta, Saurabh

Doctoral Committee Chair(s)

Gupta, Saurabh

Committee Member(s)

Forsyth, David
Lazebnik, Svetlana
Damen, Dima

Department of Study

Siebel School Comp & Data Sci

Discipline

Computer Science

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

computer vision
robotics

Abstract

Hands are central to human interactions in the real world, enabling us to perform several tasks with different objects on a daily basis. Imagine if we can develop AI agents, either digital (in the form of animation sequences) or physical (such as real-world robots), that allow us to interact with the environment in the same way as our hands. Digital agents can enable coaching applications using mixed reality devices by overlaying animations with the objects in the 3D space. For example, they can teach us how to move our hands & fingers, to switch between notes while playing a piano or to screw a bolt when repairing a machine. We can even have household robots to perform everyday tasks like loading the dishwasher, doing laundry, folding clothes, organizing shelves or cooking. To develop such AI agents, it is crucial to study different aspects of hand-object interactions, including the 3D structure of hands, objects, contact regions, and motion sequences. A central challenge in training these AI agents is acquiring 3D data to train machine learning models for 3D prediction. Unlike 2D labels that can be obtained through human labelers, collecting 3D annotations requires extensive lab setups. These setups often constrain the interactions depending on the capture settings, thereby hindering the performance of the models (trained on lab data) when transferred to everyday scenarios. Fortunately, there is an abundance of egocentric videos showing hand-object interactions in the wild with a large diversity of objects and actions. However, unlike 2D annotations, it is not possible to obtain 3D labels for these videos through human labelers. This limits the utility of these videos for training learning-based methods for 3D prediction. In this work, we propose methods for 4D reconstruction of hands, objects and contacts from large-scale everyday videos, extending beyond controlled lab settings. This involves two main insights: (1) 2D information, such as segmentation masks and keypoints, can be estimated effectively from videos using large vision models, (2) 3D shape priors can be extracted from lab & synthetic sources and combined with 2D masks & keypoints for 3D reconstruction. Using the data generated from these reconstruction models, we develop techniques for 4D motion forecasting (for generating animations) and robot learning from human videos. Overall, this dissertation lays the groundwork for how to scale up learning-based methods for 4D reconstruction of hand-object interactions from everyday videos for applications in animation and robotics. We hope this serves as a stepping-stone towards developing both digital and physical AI agents to perform tasks similar to human hands.

Graduation Semester

2025-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/132522

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Hands in action: from 4D reconstruction to animation and robotics

Aditya Prakash, -

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In