Files in this item



application/pdf450.pdf (179kB)
(no description provided)PDF


Title:Label Annotation through Biodiversity Enhanced Learning
Author(s):Heidorn, P. Bryan; Zhang, Qianjin
Contributor(s):Chong, Steven
semantic markup
digital curation and preservation
information retrieval
machine learning
Abstract:The LABELX (Label Annotation through Biodiversity Enhanced Learning) is an extension of the HERBIS NLP system reported previously (Heidorn & Wei, 2008). The objective of the system is to formaly structure output from Optical Character Recognition (OCR) of the highly variable labels of natural history museum specimens. OCR errors are common in the OCR output. Genus and species names are particularly prone to errors. Records are preprocessed using a fuzzy-match algorithm to find and replace genus and species names, including those with OCR errors, and replace those with a constant token. Integers and strings that begin with Alphabetic characters and end with numbers are also replaced with tokens. LABELX generates structured XML data and RDF and makes corrections to OCR errors in some fields. The main algorithm is a Hidden Markov Model (HMM). This poster reports an enhancement to the previous system with a larger data set.
Issue Date:2013-02
Citation Info:Heidorn, P. B., & Zhang, Q. (2013). Label annotation through biodiversity enhanced learning. iConference 2013 Proceedings (pp. 882-884). doi:10.9776/13450
Genre:Conference Poster
Publication Status:published or submitted for publication
Peer Reviewed:is peer reviewed
Rights Information:Copyright © 2013 is held by the authors. Copyright permissions, when appropriate, must be obtained directly from the authors.
Date Available in IDEALS:2013-02-02

This item appears in the following Collection(s)

Item Statistics