Towards holistic scene understanding from monocular video

Hu, Yuan-Ting

Towards holistic scene understanding from monocular video

Hu, Yuan-Ting

Permalink

https://hdl.handle.net/2142/116107

Description

Title

Towards holistic scene understanding from monocular video

Author(s)

Hu, Yuan-Ting

Issue Date

2022-07-15

Director of Research (if dissertation) or Advisor (if thesis)

Schwing, Alexander Gerhard

Doctoral Committee Chair(s)

Schwing, Alexander Gerhard

Committee Member(s)

Forsyth, David
Hoiem, Derek
Patel, Sanjay
Huang, Jia-Bin

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Scene understanding
Video understanding
Video segmentation
Object segmentation
Amodal segmentation
3D reconstruction

Language

eng

Abstract

Humans have the remarkable ability to vividly envision future scenarios as they are capable of understanding scenes in a holistic manner. They can extrapolate scene information such as object shapes and interactions from the observed scene content and its dynamics. Importantly, they can reason about unseen information, e.g., when objects are partially observed. In contrast, while computer vision and machine learning systems can successfully explain observations, it remains challenging to develop autonomous agents that can infer the unseen and have a holistic understanding of the environment. In this dissertation, we discuss techniques that tackle research problems related to holistic scene understanding from monocular video data. To study holistic scene understanding from monocular video, we first present models for human pose understanding from video. Second, we study the research problem of track moving objects under challenging conditions such as occlusion and appearance change. Third, we then consider a challenging task, amodal understanding of objects in a scene from video, aiming to infer the entirety of objects even if they are only partially observed. To enable data-driven approaches towards video amodal perception, we present a large-scale video dataset where more than 1.8 million objects are annotated with amodal labels. With the proposed dataset, we study and present video algorithms that infer the unseen and understand the scene dynamics as well as 3D shapes with partially occluded data. Last, we present a method to show how geometric cues predicted from 2D can improve 3D understanding of objects in the scene. We then conclude and discuss future directions towards holistic video scene understanding.

Graduation Semester

2022-08

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/116107

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

Towards holistic scene understanding from monocular video

Hu, Yuan-Ting

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In