Files in this item

FilesDescriptionFormat

application/pdf

application/pdfSCHIEFERSTEIN-THESIS-2018.pdf (376kB)
(no description provided)PDF

Description

Title:Improving neural language models on low-resource creole languages
Author(s):Schieferstein, Sarah
Advisor(s):Hockenmaier, Julia
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Natural Language Processing
Neural Networks
Deep Learning
Linguistics
Creole Languages
Creolistics
Abstract:When using neural models for NLP tasks, like language modelling, it is difficult to utilize a language with little data, also known as a low-resource language. Creole languages are frequently low-resource and as such it is difficult to train neural language models for them well. Creole languages are a special type of language that is widely thought of as having multiple parents and thus receiving a mix of evolutionary traits from all of them. One of a creole language’s parents is known as the lexifier, which gives the creole its lexicon, and the other parents are known as substrates, which possibly are thought to give the creole language its morphology and syntax. Creole languages are most lexically similar to their lexifier and most syntactically similar to otherwise unrelated creole languages. High lexical similarity to the lexifier is unsurprising because by definition lexifiers provide a creole’s lexicon, but high syntactic similarity to the other unrelated creole languages is not obvious and is explored in detail. We can use this information about creole languages’ unique genesis and typology to decrease the perplexity of neural language models on low-resource creole languages. We discovered that syntactically similar languages (especially other creole languages) can successfully transfer learned features during pretraining from a high-resource language to a low-resource creole language through a method called neural stacking. A method that normalized the vocabulary of a creole language to its lexifier also lowered perplexities of creole-language neural models.
Issue Date:2018-12-11
Type:Thesis
URI:http://hdl.handle.net/2142/102512
Rights Information:Copyright 2018 Sarah Schieferstein
Date Available in IDEALS:2019-02-06
Date Deposited:2018-12


This item appears in the following Collection(s)

Item Statistics