Files in this item
|(no description provided)|
|Title:||Using syntax, semantics, and competitive scoring to predict prepositional phrase attachment sites|
|Author(s):||Gordon, Judith V.|
|Doctoral Committee Chair(s):||Morgan, Jerry L.|
|Department / Program:||Linguistics|
|Degree Granting Institution:||University of Illinois at Urbana-Champaign|
|Abstract:||Prepositional phrase (PP) attachment is notorious for causing structural ambiguity problems in natural language processing. A structure like V-NP-PP where the PP is an optional locative PP forms a case in point. If we apply syntactic rules only, at least two different attachment sites will be possible: the noun phrase (NP) and the verb phrase (VP) that includes the verb and the NP. In some cases, only one of these attachments reflects a plausible interpretation of meaning.
Previous approaches to disambiguation have achieved limited success. I propose and test a new Semantic + Syntactic Competitive Scoring (SSCS) approach with PP-attachment rules based on the selectional properties of individual prepositions plus five different verb types and the hypernyms (more abstract categories) of the polysemous meanings of nouns (for example, car IS A vehicle IS AN entity). A computerized classification program derives the verb types after examining VPs in large corpora. Noun hypernyms are taken from Princeton's WordNet, a computerized lexicon which already contains over 65,000 nouns.
SSCS PP-attachment rules apply to VP ... NP ... PP structures, where the NP may be a complement of the verb or of an intervening preposition and any number of PPs may appear between the VP and the NP and between the NP and the PP to be attached. To develop and test these rules, I created separate randomized design data and test files from 1,648 VPs containing in/on/at prepositions. The VPs were taken from four different Penn Treebank preparsed texts.
SSCS rules achieved an overall 92% success rate for the design data and an 82% success rate for the test files. However, a few modifications of underspecified verb types and alternative attachment sites plus a little more consideration of syntax and of the semantics of compound nouns and coordinate structures raise these success rates to 95% for the design data and 94% for the test files. Based on these results, I estimate that only 5-6% of all VPs containing in/on/at PPs need considerations from context of discourse and/or world knowledge to predict PP-attachment sites successfully.
In addition to their implications for PP-attachment disambiguation, these results also have implication for other types of structural disambiguation and for future development of robust lexicons and parsers able to process multiple varieties of English in large corpora.
|Rights Information:||Copyright 1995 Gordon, Judith V.|
|Date Available in IDEALS:||2011-05-07|
|Identifier in Online Catalog:||AAI9543594|