Files in this item



application/pdf471.pdf (230kB)
(no description provided)PDF


Title:Help iDigBio reveal hidden data: iDigBio Augmenting OCR working group needs you
Author(s):Paul, Deborah L.; Heidorn, P. Bryan; Best, Jason; Gilbert, Edward; Neill, Amanda; Nelson, Gil; Ulate, William
natural language
information analysis
machine language
information organization
information services
research methods
information retrieval
qualitative data analysis
iDigBio hackathon
Abstract:Integrated Digitized Biodiversity Collections, iDigBio, is funded under the National Science Foundation's (NSF) Advancing Digitization of Biological Collections (ADBC) program to help hundreds of natural history museums get specimen data out of millions of drawers and off of specimen labels into an integrated database for everyone. Over 130 museums are working together, funded as Thematic Collection Networks (TCNs), to capture standardized data to send to iDigBio's HUB, Home Uniting Biocollections cyberinfrastructure. Optical Character Recognition (OCR) and OCR output analysis play an important role in many museum object-to-image-to-data digitization workflows and are integral to several of the current TCN digitization projects. The iDigBio Augmenting OCR working group (AOCR) was formed to develop a multi-faceted approach to improvement of OCR strategies, including investigation of image segmentation, autocorrection of typographical errors, semantic autocorrection, autonormalization, automated text segmentation, generating consensus records, and user interfaces.
Issue Date:2013-02
Citation Info:Paul, D., Heidorn, P. B., Best, J., Gilbert, E., Neill, A., Nelson, G., & Ulate, W. (2013). Help iDigBio reveal hidden data: iDigBio Augmenting OCR working group needs you. iConference 2013 Proceedings (pp. 1019-1021). doi:10.9776/13471
Genre:Conference Paper / Presentation
Publication Status:published or submitted for publication
Peer Reviewed:is peer reviewed
Rights Information:Copyright © 2013 is held by the authors. Copyright permissions, when appropriate, must be obtained directly from the authors.
Date Available in IDEALS:2013-02-04

This item appears in the following Collection(s)

Item Statistics