Files in this item



application/pdfMU-THESIS-2016.pdf (1MB)
(no description provided)PDF


Title:Semantic modeling of the natural language of Wikipedia annotations
Author(s):Mu, Jiaqi
Advisor(s):Viswanath, Pramod; Bhat, Suma P.
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):entity embedding
knowledge base completion
Abstract:Knowledge bases (KB) store relational facts and constitute a significant resource for a variety of natural language processing (NLP) tasks. Improving their coverage and refining the relations is a basic and pressing research effort. In this thesis we propose a novel approach towards this canonical task by using the unstructured Wikipedia corpus: we extract low-dimensional embeddings for title pages of the Wikipedia corpus and show that they can be used to significantly outperform state-of-the-art approaches on a variety of metrics in three concrete tasks: measuring semantic relatedness, solving semantic analogies, and KB completion and refinement. A central feature of our work is a new log-linear discriminative model for the annotations inside a Wikipedia document that we name IBOE (isotropic bag-of-entities): we hypothesize that the parameters of the model satisfy a geometric symmetry property (isotropy). We show that the isotropy property leads to self-normalization allowing for the design of an efficient parameter estimation algorithm that we christen wiki2vec. The self-normalization property of IBOE is validated empirically on the Wikipedia corpus and is also of independent mathematical interest.
Issue Date:2016-07-15
Rights Information:Copyright 2016 Jiaqi Mu
Date Available in IDEALS:2016-11-10
Date Deposited:2016-08

This item appears in the following Collection(s)

Item Statistics