Files in this item

FilesDescriptionFormat

application/pdf

application/pdfABDAR-THESIS-2016.pdf (1MB)Restricted Access
(no description provided)PDF

Description

Title:Compiling contextualized lists of frequent vocabulary from user- supplied corpora using natural language processing techniques
Author(s):Abdar, Omid
Advisor(s):Sadler, Randall
Contributor(s):Schwartz, Lane
Department / Program:Linguistics
Discipline:Teaching of English Sec Lang
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.A.
Genre:Thesis
Subject(s):English for Specific Purposes, Vocabulary, Wordlists, Natural Language Processing
Abstract:Since there are thousands of words to learn in a new language, one common challenge for language learners and teachers is knowing which vocabulary items to prioritize over the others and, in general, setting vocabulary-learning goals. Within vocabulary teaching research, one approach has been to focus on lists of the most common vocabulary. West (1953) proposed a list of the 2000 most frequent word families in English that, it was argued, were most important for learners to master. Along the same lines, Coxhead (2000) offered a list of the most common words in academic English known as the Academic Word List (AWL). Arguing that AWL did not adequately reflect the learners’ specialized vocabulary needs, however, corpus linguists began to develop wordlists in specialized subject areas with an English for Specific Purposes (ESP) perspective for students in Business, Engineering, Medical, and Law majors and so on. A central theme in almost all previous endeavors to develop better wordlists has been the notion of 'representativeness'—the extent to which a wordlist 'represents' the language needs of leaners. In this study, it is proposed that an alternative way to maximize representativeness in a wordlist is to enable users to compile a wordlist from any text or corpus that is of interest to them and to provide the means of compiling a wordlist using that text. Using Natural Language Toolkit (NLTK), this study shows how a few Natural Language Processing (NLP) techniques may be used to compile a list of the most common words in the Europarl corpus along with retrieving example sentences from the corpus for each word. This new approach can have applications for both language leaners as well as for the purposes of preparing instructional materials in an ESP setting.
Issue Date:2016-07-15
Type:Thesis
URI:http://hdl.handle.net/2142/92955
Rights Information:Copyright 2016 Omid Abdar
Date Available in IDEALS:2016-11-10
Date Deposited:2016-08


This item appears in the following Collection(s)

Item Statistics