Title:Proposal for Persistent & Unique Entity Identifiers
Author(s):Jett, Jacob; Ruan, Guangchen; Unnikrishnan, Leena; Fallaw, Colleen; Maden, Christopher; Cole, Timothy
Subject(s):system architecture
HathiTrust Research Center
HathiTrust Digital Library
digital libraries
Abstract:This proposal argues for the establishment of persistent and unique identifiers for page level content. The page is a key conceptual entity within the HathiTrust Research Center (HTRC) framework. Volumes are composed of pages and pages are the size of the portions of data that the HTRC’s analytics modules consume and execute algorithms across. The need for infrastructure that supports persistent and unique identity for is best described by seven use cases: 1. Persistent Citability: Scholars engaging in the analysis of HTRC resources have a clear need to cite those resources in a persistent manner independent of those resources’ relative positions within other entities. 2. Point-in-time Citability: Scholars engaging in the analysis of HTRC resources have a clear need to cite resources in an unambiguous way that is persistent with respect to time. 3. Reproducibility: Scholars need methods by which the resources that they cite can be shared so that their work conforms to the norms of peer-review and reproducibility of results. 4. Supporting “non-consumptive” Usage: Anonymizing page-level content by disassociating it from the volumes that it is conceptually a part of increases the difficulty of leveraging HTRC analytics modules for the direct reproduction of HathiTrust (HT) content. 5. Improved Granularity: Since many features that scholars are interested in exist at the conceptual level of a page rather than at the level of a volume, unique page-level entities expand the types of methods by which worksets can be gathered and by which analytics modules can be constructed. 6. Expanded Workset Membership: In the near future we would like to empower scholars with options for creating worksets from arbitrary resources at arbitrary levels of granularity, including constructing worksets from collections of arbitrary pages. 7. Supporting Graph Representations: Unique identifiers for page-level content facilitate the creation of more conceptually accurate and functional graph representations of the HT corpus. There several ways
Issue Date:2014-08-22
Publisher:University of Illinois
Citation Info:Jett, J., Ruan, G.C., Unnikrishnan, L., Fallaw, C., Maden, C., Cole, T. (2014). Proposal for persistent and unique identifiers. Unpublished technical report to the HathiTrust Research Center Advisory Board. University of Illinois, Champaign, IL.
Genre:Technical Report
