Files in this item



application/pdfCERVANTES-THESIS-2018.pdf (5MB)
(no description provided)PDF


Title:Entity-based scene understanding
Author(s):Cervantes, Christopher Michael
Advisor(s):Hockenmaier, Julia
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):coreference, bridging, grounding, neural networks, LSTM, Flickr30k Entities, MSCOCO
Abstract:Unifying multiple descriptions to determine the details of an everyday event can be a challenging task for humans. Though incorporating other modalities like images or videos can help humans unify such descriptions, this remains a challenging task for computational systems. We define entity-based scene understanding as the task of identifying the entities in a visual scene from multiple descriptions. This task subsumes coreference resolution, bridging resolution, and grounding to produce mutually consistent relations between entity mentions and groundings between mentions and image regions. Using neural classifiers and integer linear program inference, we show that grounding is improved when forced to conform to relation predictions. We introduce the Flickr30k Entities v2 dataset, and show how our methods can be used to automatically generate similarly rich annotations for the MSCOCO dataset.
Issue Date:2018-04-25
Rights Information:Copyright 2018 Christopher Cervantes
Date Available in IDEALS:2018-09-04
Date Deposited:2018-05

This item appears in the following Collection(s)

Item Statistics