Files in this item



application/pdf186_ready.pdf (675kB)
(no description provided)PDF


Title:Toward Enhanced Metadata Quality of Large-Scale Digital Libraries: Estimating Volume Time Range
Author(s):Guo, Siyuan; Edelblute, Trevor; Dai, Bin; Chen, Miao; Liu, Xiaozhong
Subject(s):data analytics and evaluation
information organization and metadata
text/data/knowledge mining
Abstract:In large-scale digital libraries, it is not uncommon that some bibliographic fields in metadata records are incomplete or missing. Adding to the incomplete or missing metadata can greatly facilitate users' search and access to digital library resources. Temporal information, such as publication date, is a key descriptor of digital resources. In this study, we investigate text mining methods to automatically resolve missing publication dates for the HathiTrust corpora, a large collection of documents digitized by optical character recognition (OCR). In comparison with previous approaches using only unigrams as features, our experiment results show that methods incorporating higher order n-gram features, e.g., bigrams and trigrams, can more effectively classify a document into discrete temporal intervals or "chronons". Our approach can be generalized to classify volumes within other digital libraries.
Issue Date:2015-03-15
Series/Report:iConference 2015 Proceedings
Genre:Conference Paper/Presentation
Peer Reviewed:yes
Rights Information:Copyright 2015 is held by the authors. Copyright permissions, when appropriate, must be obtained directly from the authors.
Date Available in IDEALS:2015-03-24

This item appears in the following Collection(s)

Item Statistics