Files in this item



application/pdfRui_Wang.pdf (2MB)
(no description provided)PDF


Title:Entity finder: A system for entity web page retrieval using pseudo-relevance feedback
Author(s):Wang, Rui
Advisor(s):Zhai, ChengXiang
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Information Retrieval
Web Search
Relevance Feedback
Abstract:Collecting all the online information about a particular entity (e.g., a person or a product) is a task commonly needed in many applications. In many cases, there often already exists a database with limited information about interesting entities, but those databases usually suffer from incompleteness and out-of-date problems. But with the increasing amount of information available on the World Wide Web, crawling and searching the web may be an attractive technological approach that can help update a database to make it more complete and up to date. In this thesis, we propose a retrieval system that crawls and searches the web in order to complete and update the information about an entity already existent in a database maintained by an organization. Taking the information stored in the database as input, this system can crawl the web and retrieve the web pages mentioning the entities in the database. We study several approaches to solving this special retrieval problem, and propose a novel pseudo-relevance feedback approach to improve the retrieval accuracy. We evaluate our system over a dataset containing 112 alumni in the College of Engineering of the University of Illinois, and show that our system can effectively retrieve relevant pages of alumni on the web and that the novel pseudo-relevance feedback method outperforms a simple baseline approach.
Issue Date:2014-05-30
Rights Information:Copyright 2014 Rui Wang
Date Available in IDEALS:2014-05-30
Date Deposited:2014-05

This item appears in the following Collection(s)

Item Statistics