Files in this item



application/pdfMU-DISSERTATION-2019.pdf (2MB)Restricted Access
(no description provided)PDF


Title:Geometries of word embeddings
Author(s):Mu, Jiaqi
Director of Research:Viswanath, Pramod
Doctoral Committee Chair(s):Viswanath, Pramod
Doctoral Committee Member(s):Srikant, Rayadurgam; Bhat, Suma; Oh, Sewoong; Sun, Ruoyu
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):word embedding
natural language processing
representation learning
Abstract:Real-valued word embeddings have transformed natural language processing (NLP) applications, recognized for their ability to capture linguistic regularities. Popular examples are word2vec, GloVe, GPT and BERT. Both word2vec and GloVe are static whose word representations are independent of its context, while GPT and BERT are contextualized whose representations will change corresponding to the semantics conveyed by its surrounding words. In this dissertation, we study four problems associated with the geometrics of word embeddings. First, we demonstrate a very simple, and yet counter-intuitive, postprocessing technique -- eliminate the common mean vector and a few top dominating directions from the word vectors -- that achieves better performances on a variety of standard benchmarks than the original ones. Sentences, as a sequence of words, are also important semantic units of natural language. We extend the embeddings of words toward representing sentences by the low-rank subspace spanned by its word vectors. Such an unsupervised representation is empirically validated via semantic textual similarity tasks on 19 different datasets, where it outperforms the sophisticated neural network models by 15% on average. Having a good sentence embedding, in turn, helps improve word representations. This is because a single vector does not suffice to model the polysemous nature of many (frequent) words, i.e., words with multiple meanings. We leverage the sentence representations on em unsupervised polysemy modeling, which we call K-Grassmeans. This approach is quantitatively tested on standard sense induction and disambiguation datasets and present new state-of-the-art results. Finally, we study the contextualized word embeddings. Given the rapid growth of computational power, pretrained language models are proposed to capture common-sense knowledge hidden behind large training corpora and have achieved great success in natural language understanding (NLU) tasks. We study these pretrained language models using influence function, which characterizes the influence of different training samples on the prediction result of each test sample. The empirical discoveries suggest an interesting future research direction: designing novel regularizations to penalize these correlations during fine-tuning.
Issue Date:2019-09-04
Rights Information:Copyright 2019 Jiaqi Mu
Date Available in IDEALS:2020-03-02
Date Deposited:2019-12

This item appears in the following Collection(s)

Item Statistics