Files in this item

FilesDescriptionFormat

application/pdf

application/pdf266.pdf (240kB)
(no description provided)PDF

Description

Title:Augmenting optical character recognition (OCR) for improved digitization: Strategies to access scientific data in natural history collections
Author(s):Paul, Deborah L.; Heidorn, P. Bryan
Subject(s):iDigBio
OCR
natural language
information analysis
machine language
information organization
information services
research methods
information retrieval
qualitative data analysis
Abstract:The Augmenting OCR Working Group (A-OCR WG) at Integrated Digitized Biocollections (iDigBio) seeks to improve community OCR strategies and algorithms for faster, better parsing of OCR output derived from valuable data on natural history collection specimen labels. This task is exceedingly difficult because museum labels are often annotated, and vary in content, form and font. Under the National Science Foundation's (NSF) Advancing Digitization of Biological Collections (ADBC) program, iDigBio is building a cyberinfrastructure to aggregate quality data from museum specimens housed in collections across the United States for use by researchers, educators, environmentalists and the public. Since March of 2012, the A-OCR WG formed from community consensus to begin its role in this endeavor, defining reachable goals including setting up a hackathon concurrent with iConference 2013. This paper reports on the definition of some key problems identified by the A-OCR WG since these science problems will drive research and cyberinfrastructure development.
Issue Date:2013-02
Publisher:iSchools
Citation Info:Paul, D., & Heidorn, P. B. (2013). Augmenting optical character recognition (OCR) for improved digitization: Strategies to access scientific data in natural history collections. iConference 2013 Proceedings (pp. 514-518). doi:10.9776/13266
Genre:Conference Paper / Presentation
Type:Text
Language:English
URI:http://hdl.handle.net/2142/39427
DOI:10.9776/13266
Publication Status:published or submitted for publication
Peer Reviewed:is peer reviewed
Rights Information:Copyright © 2013 is held by the authors. Copyright permissions, when appropriate, must be obtained directly from the authors.
Date Available in IDEALS:2013-01-30


This item appears in the following Collection(s)

Item Statistics