Files in this item

FilesDescriptionFormat

application/pdf

application/pdfRuiqi_Guo.pdf (28MB)
(no description provided)PDF

Description

Title:Scene understanding with complete scenes and structured representations
Author(s):Guo, Ruiqi
Director of Research:Hoiem, Derek W.
Doctoral Committee Chair(s):Hoiem, Derek W.
Doctoral Committee Member(s):Forsyth, David A.; Roth, Dan; Urtasun, Raquel
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Scene Understanding
Computer Vision
Machine Learning
Computer Graphics
Image Parsing
Image Segmentation
RGB-D images
Abstract:Humans can understand scenes with abundant detail: they see layouts, surfaces, the shape of objects among other details. By contrast, many machine-based scene analysis algorithms use simple representation to parse scenes, mainly bounding boxes and pixel labels, and apply only to visible regions. We believe we should move to deeper levels of scene analysis, embracing more a comprehensive, structured representation. In this dissertation, we focus on analyzing scenes to their complete extent and structured details. First off, our work uses a structured representation that is closer to human interpretation, with a mixture of layout, functional objects and clutter. We developed annotation tools and collected a dataset of 1449 rooms annotated in detailed 3D models. Another feature of our work is that we understand scenes to their complete extent, even parts of them beyond the line of the sight. We present a simple framework to detect visible portion with appearance-based models and then infer the occluded portion with a contextual approach. We integrate contexts from surrounding regions, the spatial prior and shape regularity of background surfaces. Our method is applicable to 2D images, and can also be used to infer support surfaces in 3D scenarios. Our complete surface prediction quantitatively out-performs relevant baselines, especially when they are occluded. Finally, we present a system that interprets from single-view RGB-D images of indoor scenes into our proposed representation. Such a scene interpretation is useful for robotics and visual reasoning but difficult to produce due to the well-known challenge of segmenting objects, the high degree of occlusion, and the diversity of objects in indoor scenes. We take a data-driven approach, generating sets of potential object regions, matching them with regions in training images, and transferring and aligning associated 3D models while encouraging them to be consistent with observed depths. To the best of our knowledge, this is the first automatic system capable of interpreting scenes into 3D models with similar levels of detail.
Issue Date:2014-09-16
URI:http://hdl.handle.net/2142/50564
Rights Information:Copyright 2014 Ruiqi Guo
Date Available in IDEALS:2014-09-16
Date Deposited:2014-08


This item appears in the following Collection(s)

Item Statistics