Towards holistic scene understanding from monocular video
Hu, Yuan-Ting
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/116107
Description
Title
Towards holistic scene understanding from monocular video
Author(s)
Hu, Yuan-Ting
Issue Date
2022-07-15
Director of Research (if dissertation) or Advisor (if thesis)
Schwing, Alexander Gerhard
Doctoral Committee Chair(s)
Schwing, Alexander Gerhard
Committee Member(s)
Forsyth, David
Hoiem, Derek
Patel, Sanjay
Huang, Jia-Bin
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Scene understanding
Video understanding
Video segmentation
Object segmentation
Amodal segmentation
3D reconstruction
Abstract
Humans have the remarkable ability to vividly envision future scenarios as they are capable of understanding scenes in a holistic manner. They can extrapolate scene information such as object shapes and interactions from the observed scene content and its dynamics. Importantly, they can reason about unseen information, e.g., when objects are partially observed. In contrast, while computer vision and machine learning systems can successfully explain observations, it remains challenging to develop autonomous agents that can infer the unseen and have a holistic understanding of the environment. In this dissertation, we discuss techniques that tackle research problems related to holistic scene understanding from monocular video data.
To study holistic scene understanding from monocular video, we first present models for human pose understanding from video. Second, we study the research problem of track moving objects under challenging conditions such as occlusion and appearance change. Third, we then consider a challenging task, amodal understanding of objects in a scene from video, aiming to infer the entirety of objects even if they are only partially observed. To enable data-driven approaches towards video amodal perception, we present a large-scale video dataset where more than 1.8 million objects are annotated with amodal labels. With the proposed dataset, we study and present video algorithms that infer the unseen and understand the scene dynamics as well as 3D shapes with partially occluded data. Last, we present a method to show how geometric cues predicted from 2D can improve 3D understanding of objects in the scene. We then conclude and discuss future directions towards holistic video scene understanding.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.