Files in this item

FilesDescriptionFormat

application/pdf

application/pdf493.pdf (578kB)
(no description provided)PDF

Description

Title:Improving the Character of Optical Character Recognition (OCR): iDigBio Augmenting OCR Working Group Seeks Collaborators and Strategies to Improve OCR Output and Parsing of OCR Output ...
Author(s):Anglin, Robert; Best, Jason; Figueiredo, Renato; Gilbert, Edward; Gnanasambandam, Nathan; Gottschalk, Stephen; Haston, Elspeth; Heidorn, P. Bryan; Lafferty, Daryl; Lang, Peter; Nelson, Gil; Paul, Deborah L.; Ulate, William; Watson, Kimberly; Zhang, Qianjin
Subject(s):iDigBio
OCR
natural language
information analysis
machine language
information organization
information services
research methods
information retrieval
qualitative data analysis
Abstract:There are an estimated 2 – 3 billion museum specimens world – wide (OECD 1999, Ariño 2010). In an effort to increase the research value of their collections, institutions across the U. S. have been seeking new ways to cost effectively transcribe the label information associated with these specimen collections. Current digitization methods are still relatively slow, labor-intensive, and therefore expensive. New methods, such as optical character recognition (OCR), natural language processing, and human-in-the-loop assisted parsing are being explored to reduce these costs. The National Science Foundation (NSF), through the Advancing Digitization of Biodiversity Collections (ADBC) program, funded Integrated Digitized Biocollections (iDigBio) in 2011 to create a Home Uniting Biodiversity Collections (HUB) cyberinfrastructure to aggregate and collectively integrate specimen data and find ways to digitize specimen data faithfully and faster and disseminate the knowledge of how to achieve this. The iDigBio Augmenting OCR Working Group is part of this national effort. - speed up the overall digitization process, - lower the cost, - improve overall efficiency, - assure digitized data is fit-for-use (NIBA 2010, Chapman 2005), and - provide the resulting digitized data records to researchers more quickly. The iDigBio Augmenting OCR (A-OCR) working group is actively engaged in identifying opportunities for collaboration to leverage OCR tools and technologies that are successful (both within and outside of the biology digitization domain) and disseminate these tools to the public or seek funding for development. "
Issue Date:2013-02
Publisher:iSchools
Citation Info:Anglin, R., Best, J., Figueiredo, R., Gilbert, E., Gnanasambandam, N., Gottschalk, S.,... Zhang, Q. (2013). Improving the Character of Optical Character Recognition (OCR): iDigBio Augmenting OCR Working Group Seeks Collaborators and Strategies to Improve OCR Output and Parsing of OCR Output for Faster, More Efficient, Cheaper Natural History Collections Specimen Label Digitization. iConference 2013 Proceedings (pp.957-964).doi:10.9776/13493
Genre:Conference Poster
Type:Text
Language:English
URI:http://hdl.handle.net/2142/42089
DOI:10.9776/13493
Publication Status:published or submitted for publication
Peer Reviewed:is peer reviewed
Rights Information:Copyright © 2013 is held by the authors. Copyright permissions, when appropriate, must be obtained directly from the authors.
Date Available in IDEALS:2013-02-03


This item appears in the following Collection(s)

Item Statistics