Files in this item



application/pdfJJett_CAS_Thesis.pdf (713kB)


Title:Supplementing OAI-PMH in the IMLS Digital Collections & Content Aggregation
Author(s):Jett, Jacob
digital aggregations
harvesting digital collections
website HTML parsing
broadcast metasearch
Abstract:The rate of adoption of OAI-PMH among the IMLS DCC (Digital Collections & Content) data providers remains a modest 23%. As a result, large quantities of item-level metadata records cannot be harvested into the DCC aggregation’s item-level metadata repository. This thesis explores alternate methods of harvesting item-level metadata, either through the use of website HTML parsing technologies to capture metadata directly from webpages and permanently store it as xml files or through the use of broadcast metasearch technologies to provide additional links to information resources within the DCC’s search results page. The nature of “collections” is also explored and a classification system based on the nature of the “items” within each collection is constructed in order to both better understand the contents of the DCC aggregate and to facilitate the prediction of experiment outcomes. While labor intensive with regards to the need to construct metadata standard crosswalks and retool harvesting code, website HTML parsing is found to be a powerful tool for both increasing the rate of item-level metadata repository growth and enhancing the choices for both aggregate users and collection developers. While broadcast metasearch experiments were inconclusive several emerging applications of broadcast metasearch technology may be promising methods of supplementing the contents of the aggregate’s item-level metadata repository.
Issue Date:2010-11-16
Genre:Dissertation / Thesis
Publication Status:unpublished
Peer Reviewed:not peer reviewed
Date Available in IDEALS:2010-11-16

This item appears in the following Collection(s)

Item Statistics