Files in this item



application/pdfALEYASEN-THESIS-2015.pdf (318kB)
(no description provided)PDF


Title:Entity recognition for multi-modal socio-technical systems
Author(s):Aleyasen, Amirhossein
Advisor(s):Winslett, Marianne; Diesner, Jana
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Information Extraction
Entity Recognition
Supervised Learning
Conditional Random Field
Abstract:Entity Recognition (ER) can be used as a method for extracting information about socio-technical systems from unstructured, natural language text data. This process is limited by the set of entity classes considered in many current ER solutions. In this thesis, we report on the development of an ER classifier that supports a wide range of entity classes that are relevant for analyzing multi-modal, socio-technical systems. Another limitation with current entity extractors is that they mainly support the detection of named entities, typically in the form of proper nouns. The presented solution also detects entities not referred to by a name, such as general references to places (e.g. forest) or natural resources (e.g. timber). We use supervised machine learning for this project. To overcome data sparseness issues that results from considering a large number of entity classes, we built two separate classifiers for predicting labels for entity boundary and class. We herein investigate rules for merging both labels while minimizing the loss of accuracy due to this step. The accuracy of our classifier for the largest model with 94 classes achieves 75.9%. We compare the performance of our solution to other standard systems on several datasets, finding that with the same number of classes, the accuracy of our classifier is comparable to other state-of-the-art ER packages.
Issue Date:2015-07-22
Rights Information:Copyright 2015 Amirhossein Aleyasen
Date Available in IDEALS:2015-09-29
Date Deposited:August 201

This item appears in the following Collection(s)

Item Statistics