Files in this item



application/pdfFarhadi_Ali.pdf (93MB)
(no description provided)PDF


Title:Designing representational architectures in recognition
Author(s):Farhadi, Ali
Director of Research:Forsyth, David A.
Doctoral Committee Member(s):Malik, Jitendra; Freeman, William; Roth, Dan; Hoiem, Derek W.; Yagnik, Jay
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Computer vision
object recognition
visual attributes
visual phrases.
Abstract:Recognition is a deep and fundamental question in computer vision. If approached correctly, object recognition provides insight to several interesting problems with crucial applications. In a typical setting, recognition is defined as the problem of learning about a fixed set of categories from training examples provided for those categories. At test time, then the problem is to which of those learned categories a test image belongs. This thesis tries to question the typical settings of recognition and shows remarkable achievements as a result of shifting our point of view to fundamentals of recognition. In current settings, the final goal of recognition systems is to predict a list of category name tags for images. But there is more to recognition that a list of category names. Images exhibit a great deal of information that cannot be conveyed with a list of name tags. The main focus of this thesis is to produce richer descriptions for images. Inspired by how human describe images, our goal is to describe images with sentences. This thesis introduces a non-parametric approach for describing images with sentences that produces promising results. Exploring the idea of describing images with sentences raises deep and interesting concerns in recognition: how to deal with unfamiliar objects, how to describe objects, and how to recognize complex composites of objects. This thesis introduces visual attributes and shows how the attribute-based recognition can reason about unfamiliar objects. The attribute-based recognition also allows description of objects, the reporting of unusual properties of familiar objects, and learning about novel categories with few or even no visual training examples (from pure textual descriptions of categories). Analogous to phrases in machine translation, this thesis also introduces visual phrases; elements of recognition that correspond to a chunk of meaning bigger than objects and smaller than scenes. Visual phrases exhibit such a characteristic appearance that makes detecting them as one entity much simpler and significantly more accurate than detecting the participating objects. This thesis shows that including visual phrases in the vocabulary of recognition results in significant improvements in recognition. %Current common practices in recognition are formed around problem settings that have been copied from the initial settings of recognition problems, and ignore tremendous progress in terms of machinery and methods. With the astonishing developments in recognition, I believe, we should rethink recognition. Recognition should be redefined to the capacity of current methods with the applications of recognition tasks in mind. In this thesis I try to question the usual settings of recognition from several different angles and show remarkable achievements as a result of shifting our point of view to the recognition problem. %There are two main categories of issues that this thesis is concerned with: knowledge transfer, and knowledge formation. Knowledge transfer is the capability of transferring knowledge gained in learning one task to relevant but new tasks. For example, learning about how the appearance of some objects changes across viewpoint may help the recognition system to reason about the change in the appearance of other objects. Knowledge formation is the ability to reshape the knowledge representation to a form most suitable for a specific recognition task. For example, how to describe an image in the most useful format related to a desired task. The work presented in this thesis tries to provide insight to deep and yet basic questions in recognition: What should we recognize? At what level should we recognize entities? What does learning about some objects reveal about other objects? What should we say when an unfamiliar object is presented? How can we learn to predict deviations from typicalities in categories? What should be the output of a recognition system? And what is the quantum of recognition? %The central theme of the methods presented in this thesis is learning representational architectures around recognition problems. My approaches to all these problems are centered around one fundamental observation: finding the right representation is a key component in recognition.
Issue Date:2012-02-06
Rights Information:Copyright 2011 Ali Farhadi
Date Available in IDEALS:2012-02-06
Date Deposited:2011-12

This item appears in the following Collection(s)

Item Statistics