- IDEALS Home
- →
- iSchools
- →
- iConferences
- →
- iConference 2013
- →
- iConference 2013 poster abstracts
- →
- View Item
Files in this item
Files | Description | Format |
---|---|---|
application/pdf ![]() | (no description provided) |
Description
Title: | Label Annotation through Biodiversity Enhanced Learning |
Author(s): | Heidorn, P. Bryan; Zhang, Qianjin |
Contributor(s): | Chong, Steven |
Subject(s): | OCR
parsing semantic markup digital curation and preservation information retrieval machine learning |
Abstract: | The LABELX (Label Annotation through Biodiversity Enhanced Learning) is an extension of the HERBIS NLP system reported previously (Heidorn & Wei, 2008). The objective of the system is to formaly structure output from Optical Character Recognition (OCR) of the highly variable labels of natural history museum specimens. OCR errors are common in the OCR output. Genus and species names are particularly prone to errors. Records are preprocessed using a fuzzy-match algorithm to find and replace genus and species names, including those with OCR errors, and replace those with a constant token. Integers and strings that begin with Alphabetic characters and end with numbers are also replaced with tokens. LABELX generates structured XML data and RDF and makes corrections to OCR errors in some fields. The main algorithm is a Hidden Markov Model (HMM). This poster reports an enhancement to the previous system with a larger data set. |
Issue Date: | 2013-02 |
Publisher: | iSchools |
Citation Info: | Heidorn, P. B., & Zhang, Q. (2013). Label annotation through biodiversity enhanced learning. iConference 2013 Proceedings (pp. 882-884). doi:10.9776/13450 |
Genre: | Conference Poster |
Type: | Text |
Language: | English |
URI: | http://hdl.handle.net/2142/42056 |
DOI: | https://doi.org/10.9776/13450 |
Publication Status: | published or submitted for publication |
Peer Reviewed: | is peer reviewed |
Rights Information: | Copyright © 2013 is held by the authors. Copyright permissions, when appropriate, must be obtained directly from the authors. |
Date Available in IDEALS: | 2013-02-02 |