Files in this item



application/pdfWCSA_DC_Final_Public.pdf (880kB)
Full public version of research proposalPDF


Title:Workset Creation for Scholarly Analysis and Data Capsules (WCSA+DC): Laying the foundations for secure computation with copyrighted data in the HathiTrust Research Center, Phase I
Author(s):Downie, J. Stephen; Plale, Beth; McDonald, Robert; Namachchivaya, Beth Sandore; Unsworth, John; Cole, Timothy W.
Contributor(s):Dubnicek, Ryan; Ma, Yu "Marie"; Underwood, Ted; Pustejovsky, James; Verhagen, Marc; Hinze, Annika; Page, Kevin; Green, Harriett
data capsules
text data mining
digital humanities
non-consumptive research
secure computing
computational linguistics
Abstract:The primary objective of the WCSA+DC project is the seamless integration of the workset model and tools with the Data Capsule framework to provide non-consumptive research access HathiTrust’s massive corpus of data objects, securely and at scale, regardless of copyright status. That is, we plan to surmount the copyright wall on behalf of scholars and their students. Notwithstanding the substantial preliminary work that has been done on both the WCSA and DC fronts, they are both still best characterized as being in the prototyping stages. It is our intention to that this proposed Phase I of the project devote an intense two-year burst of effort to move the suite of WCSA and DC prototypes from the realm of proof-of-concept to that of a firmly integrated at-scale deployment. We plan to concentrate our requested resources on making sure our systems are as secure and robust at scale as possible. Phase I will engage four external research partners. Two of the external partners, Kevin Page (Oxford) and Annika Hinze (Waikato) were recipients of WCSA prototyping sub-awards. We are very glad to propose extending and refining aspects of their prototyping work in the context of WCSA+DC. Two other scholars, Ted Underwood (Illinois) and James Pustejovsky (Brandeis) will play critical roles in Phase I as active participants in the development and refinement of the tools and systems from their particular user-scholar perspectives: Underwood, Digital Humanities (DH); Pustejovsky, Computational Linguistics (CL). The four key outcomes and benefits of the WCSA+DC, Phase I project are: 1. The deployment of a new Workset Builder tool that enhances search and discovery across the entire HTDL by complementing traditional volume-level bibliographic metadata with new metadata derived from a variety of sources at various levels granularity. 2. The creation of Linked Open Data resources to help scholars find, select, integrate and disseminate a wider range of data as part of their scholarly analysis life-cycle. 3. A new Data Capsule framework that integrates worksets, runs at scale, and does both in a secure, non-consumptive, manner. 4. A set of exemplar pre-built Data Capsules that incorporate tools commonly used by both the DH and CL communities that scholars can then customize to their specific needs.
Issue Date:2015-12-16
Sponsor:Andrew W. Mellon Foundation, grant no. 41500672
Date Available in IDEALS:2018-02-01

This item appears in the following Collection(s)

  • Illinois Research and Scholarship
    This is the default collection for all research and scholarship developed by faculty, staff, or students at the University of Illinois at Urbana-Champaign

Item Statistics