Illuminating the large-scale digital library with TORCHLITE from the HathiTrust Research Center
Layne-Worthey, Glen; Downie, J. Stephen; Walsh, John A.; Dubnicek, Ryan; Swatscheno, Janet; Satheesan, Sandeep Puthanveetil; Kudeki, Deren; Liyanage, Samitha
Loading…
Permalink
https://hdl.handle.net/2142/132993
Description
Title
Illuminating the large-scale digital library with TORCHLITE from the HathiTrust Research Center
Author(s)
Layne-Worthey, Glen
Downie, J. Stephen
Walsh, John A.
Dubnicek, Ryan
Swatscheno, Janet
Satheesan, Sandeep Puthanveetil
Kudeki, Deren
Liyanage, Samitha
Issue Date
2026-03-12
Keyword(s)
Digital libraries
Digital humanities
Open data
Copyright & fair use
Community engagement
Abstract
Introduction. Two closely related efforts to make very large-scale digital library data (representing tens of millions of items, including in-copyright sources) freely, openly, and more easily available for legally sound “non-consumptive” research are described.
Innovations and methods. We introduce first a newly expanded release of HathiTrust Research Center Extracted Features, large and innovative digital library dataset; and second, a newly created framework, API, web-based dashboard, and set of tools called “TORCHLITE” (Tools for Open Research and Computation with HathiTrust: Leveraging Intelligent Text Extraction) that greatly lower the barrier of entry for use of the data in that dataset.
Analysis & results. We also describe a public hackathon event that was held as part of the pre-release activities for the new TORCHLITE framework and API, and which resulted in expanded awareness of the research dataset, a number of experimental technical innovations and a set of working tools for the framework, and a focused collection of user-experience feedback.
Conclusion. Massive digital libraries present a host of problems — of scale, of access restrictions, of analysis, etc. — but our efforts to ameliorate these problems in the 19-million-volume HathiTrust digital library have lowered many barriers to their use.
Publisher
iSchools
Series/Report Name or Number
iConference 2026 Proceedings
Type of Resource
Other
Genre of Resource
Conference Poster
Language
eng
Permalink
https://hdl.handle.net/2142/132993
Copyright and License Information
Copyright 2026 is held by Glen Layne-Worthey, J. Stephen Downie, John A. Walsh, Ryan Dubnicek, Janet Swatscheno, Sandeep Puthanveetil Satheesan, Deren Kudeki, and Samitha Liyanage. Copyright permissions, when appropriate, must be obtained directly from the authors.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.