Files in this item



application/pdfWei_Qin.pdf (2MB)
(no description provided)PDF


Title:Information fusion in taxonomic descriptions
Author(s):Wei, Qin
Director of Research:Heidorn, P. Bryan
Doctoral Committee Chair(s):Heidorn, P. Bryan
Doctoral Committee Member(s):Smith, Linda C.; Blake, Catherine; Macklin, James
Department / Program:Library & Information Science
Discipline:Library & Information Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Information fusion
Information extraction
Abstract:Providing a single access point to an information system from multiple sources is helpful in many fields. As a case study, this research investigates the potential of applying information fusion techniques in biodiversity area since researchers in this domain desperately need information from different sources to support decision making on tasks like biological identification. Furthermore, there are massive collections in this area and the descriptive materials on the same species (object) are scattered in different places. It is not easy to manually collect information to form a broader and integrated one. As one of the most important descriptive materials in this field, floras are selected as the target of this research. This research tests a hypothesis concerning the organization of text and the constancy of fact-based information in text. It is observed that individual descriptions may not contain sufficient information to differentiate the target species from others, and different information sources might contain not only overlap information but also complementary information that is helpful. We also observe non-trivial complementary information could also be from different-level descriptions [family, genus, or species level] from the same source. By using the sample dataset from Flora of North America (FNA) and Flora of China (FOC), we found that about 50% information could only be found in single source and another 25% complementary information could be identified by fusion. And the most importantly, confliction information could only be detected by direct comparison. The question is how could we fuse the records in an automatic or semi-automatic manner, so that each resulting record provides a broader while non-redundant description of each species? The proposed system demonstrates the feasibility with currently available techniques. The prototype system contains 4 modules: Text segmentation and Taxonomic Name Identification, Organ-level and Sub-organ level Information Extraction, Relationship Identification, and Information fusion. By using the sample descriptions from Flora of North America and Flora of China, we demonstrate that the method gain promising fusion result based on Cross-Description Relationships. With the evaluation results, we identified the key factors contribute to the performance of fusion. Some methods that might lead to further improvement on fusion performances are discussed. This study also demonstrates that to a certain extent, this fusion approach is generalizable. The generalizability of this fusion approach is a challenging problem due to the typical domain- and task- oriented nature of the fusion methods. We identified the challenges while applying the approach to different data set.
Issue Date:2011-08-25
Rights Information:Copyright 2011 Qin Wei
Date Available in IDEALS:2011-08-25
Date Deposited:2011-08

This item appears in the following Collection(s)

Item Statistics