Files in this item



application/pdf9329159.pdf (11MB)Restricted to U of Illinois
(no description provided)PDF


Title:An experiment in automatic indexing with Korean texts: A comparison of syntactico-statistical and manual methods
Author(s):Seo, Eun-Gyoung
Director of Research:Smith, Linda C.
Doctoral Committee Chair(s):Smith, Linda C.
Doctoral Committee Member(s):Allen, Bryce L.; Davis, Charles H.
Department / Program:Library and Information Science
Discipline:Library and Information Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Library Science
Computer Science
Abstract:This study was undertaken in order to develop practical automatic indexing techniques suitable for Korean natural language texts. The study had four purposes: to develop an automatic indexing system for Korean texts, to evaluate the efficiency of the automatic indexing system as compared with a manual indexing system, to compare the effectiveness of weighting algorithms, and to investigate the effect of abstract length.
The basic method of this automatic indexing system was to determine the syntactic category of each text word by dictionary look-up, and then to match sequences of category symbols against a dictionary of acceptable patterns. Sequences of text words that matched one of the patterns in the dictionary were extracted as content identifiers. Finally, the system selected highly ranked content identifiers from each document based on statistical (frequency of occurrence) information.
For this experimental study, the Korean text database was constructed manually based on 100 long abstracts and 200 short abstracts covering business subjects. The study assessed how well the set of index terms produced by an automatic indexing technique reflects the major topics described in an indexed document. For the evaluation, a manual index term list was constructed by consultation between two indexers as an external standard to obtain normalized values.
The experimental results showed that the performance of the automatic syntactico-statistical indexing system was comparable to that of other studies which have compared automatic indexing with manual indexing. The WDF system performed better than the IDF system in terms of the ability to present all the correct content identifiers, and the system produced more correct content identifiers in the short abstract group. As a whole, many significant concepts represented in the abstract and recognized by human indexers have been effectively extracted automatically. The extracted concept forms are for the most part comparable to those of manual indexing. Possible enhancements of the automatic syntactico-statistical indexing system are identified which could lead to improved indexing performance.
Issue Date:1993
Rights Information:Copyright 1993 Seo, Eun-Gyoung
Date Available in IDEALS:2011-05-07
Identifier in Online Catalog:AAI9329159
OCLC Identifier:(UMI)AAI9329159

This item appears in the following Collection(s)

Item Statistics