Files in this item



application/pdfZHOU-THESIS-2021.pdf (650kB)
(no description provided)PDF


Title:Idiomatic sentence generation and paraphrasing
Author(s):Zhou, Jianing
Advisor(s):Bhat, Suma
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):natural language processing
idiom processing
Abstract:Idiomatic expressions (IE) play an important role in natural language, and have long been a “pain in the neck” for NLP systems. Despite this, text generation tasks related to IEs remain largely under-explored. In this study, we propose two new tasks of idiomatic sentence generation and paraphrasing to fill this research gap. We introduce a curated dataset of 823 IEs, and a parallel corpus with sentences containing them and the same sentences where the IEs were replaced by their literal paraphrases as the primary resource for our tasks. We benchmark existing deep learning models, which have state-of-the-art performance on related tasks using automated and manual evaluation with our dataset to inspire further research on our proposed tasks. By establishing baseline models, we pave the way for more comprehensive and accurate modeling of IEs, both for generation and paraphrasing. Inspired by psycholinguistic theories of idiom use in one’s native language, we also propose a novel approach for these tasks, which retrieves the appropriate idiom for a given literal sentence, extracts the span of the sentence to be replaced by the idiom, and generates the idiomatic sentence by using a large pre-trained language model to combine the retrieved idiom and the remainder of the sentence. For idiomatic sentence paraphrasing, the definition of the idiom in the given idiomatic sentence is first retrieved. Then the idiom in the sentence is extracted and finally, the literal counterpart is generated by a large pre-trained language model. Experiments on a novel dataset created for these tasks show that our model is able to work effectively. Furthermore, automatic and human evaluations show that for these tasks, the proposed model outperforms a series of competitive baseline models for text generation. Being able to generate literal counterparts of high quality, our method for idiomatic sentence paraphrase is also used for constructing a larger corpus with the help of MAGPIE dataset. This enlarged corpus also helps to improve the performance of different models on idiomatic sentence generation.
Issue Date:2021-04-23
Rights Information:Copyright 2021 Jianing Zhou
Date Available in IDEALS:2021-09-17
Date Deposited:2021-05

This item appears in the following Collection(s)

Item Statistics