This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Challenges in Managing Information Extraction
Shen, Warren H.
Doctoral Committee Chair(s)
Department of Study
Degree Granting Institution
University of Illinois at Urbana-Champaign
In this dissertation, we develop solutions to the key challenges mentioned above. First, we develop a declarative framework that can help make it easier for developers to write and understand IE programs, and show how to automatically optimize IE programs written in this framework to reduce runtime. Next, given that relational database systems (RDBMSs) were designed to store and process large data sets, we study the benefits and limitations of employing RDBMSs for storing and processing data in IE applications. Finally, we extend our declarative framework to enable best-effort IE, allowing developers to more easily write and refine approximate IE programs. A key idea underlying these solutions is that many of the principles behind RDBMSs for managing structured data can be extended to IE for managing unstructured data.