Files in this item
Files | Description | Format |
---|---|---|
application/pdf ![]() ![]() | (no description provided) |
Description
Title: | Representation learning of natural language and its application to language understanding and generation |
Author(s): | Gong, Hongyu |
Director of Research: | Bhat, Suma |
Doctoral Committee Chair(s): | Bhat, Suma |
Doctoral Committee Member(s): | Viswanath, Pramod; Srikant, Rayadurgam; Hwu, Wen-mei; Fanti, Giulia |
Department / Program: | Electrical & Computer Eng |
Discipline: | Electrical & Computer Engr |
Degree Granting Institution: | University of Illinois at Urbana-Champaign |
Degree: | Ph.D. |
Genre: | Dissertation |
Subject(s): | Natural Language Processing
Representation Learning Language Understanding Language Generation |
Abstract: | How to properly represent language is a crucial and fundamental problem in Natural Language Processing (NLP). Language representation learning aims to encode rich information such as the syntax and semantics of the language into dense vectors. It facilitates the modeling, manipulation and analysis of natural language in computational linguistics. Existing algorithms utilize corpus statistics such as word co-occurrences to learn general-purpose language representation. Recent advances in generic representation integrate intensive information such as contextualized features from unlabeled text corpora. In this dissertation, we continue this line of research to incorporate rich knowledge into generic embeddings. We show that word representation could be enriched with various information including temporal and spatial variations as well as syntactic functionalities, and that text representation could be refined with topical knowledge. Moreover, we develop an insight into the geometry of pre-trained representation, and connect it to the semantic understanding such as identifying the idiomatic word usage. Besides generic representation, task-dependent representation is also extensively studied in downstream applications, where the representation is trained to encode domain information from labeled datasets. This dissertation leverages the capability of neural network models to integrate the task-specific supervision into language representations. We introduce new deep learning models and algorithms to train representations with external knowledge in annotated data. It is shown that the learned representation can assist in various downstream tasks in language understanding such as text classification and language generation such as text style transfer. |
Issue Date: | 2020-04-15 |
Type: | Thesis |
URI: | http://hdl.handle.net/2142/108110 |
Rights Information: | copyright 2020 Hongyu Gong |
Date Available in IDEALS: | 2020-08-26 |
Date Deposited: | 2020-05 |
This item appears in the following Collection(s)
-
Dissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer Engineering -
Graduate Dissertations and Theses at Illinois
Graduate Theses and Dissertations at Illinois