Files in this item

FilesDescriptionFormat

application/pdf

application/pdfHT-BW_WhitePaper.pdf (361kB)
HathiTrust+Bookworm Technical ReportPDF

Description

Title:Exploring the Billions and Billions of Words in the HathiTrust Corpus with Bookworm: HathiTrust+Bookworm Project Technical Report
Author(s):Downie, J Stephen; Lieberman-Aiden, Erez
Contributor(s):Organisciak, Peter; Schmidt, Benjamin; Bhattacharyya, Sayan; Jett, Jacob
Subject(s):HathiTrust
Natural Language Processing
Metadata
Data Visualization
Abstract:Bookworm is a tool that visualizes language usage trends at large scales, designed to be powerful but simple. It allows multi-faceted slicing and dicing of the data against a set of content-based and metadata-based features. Our recent work with the HathiTrust+Bookworm (HT+BW) project has focused on improving Bookworm's ability to scale for large collections, while supporting an implementation of Bookworm over one of the largest digital book collections: the HathiTrust Digital Library. The implementation allows scholars to explore the full HathiTrust corpus — but with the control to compare on the basis of such features as subject classification, place of publication, genre, and language. It also provides tools for improved future implementations of Bookworm over non-HathiTrust collections.
Issue Date:2017-12-07
Publisher:Center for Informatics Research in Science and Scholarship, School of Information Sciences, University of Illinois at Urbana-Champaign
Citation Info:Downie, J. S. & Lieberman, E. (2017). Exploring the Billions and Billions of Words in the HathiTrust Corpus with Bookworm: HathiTrust+Bookworm Project Technical Report. Illinois: Center for Informatics Research in Science and Scholarship, School of Information Sciences, University of Illinois at Urbana-Champaign
Genre:Technical Report
Type:Text
Language:English
URI:http://hdl.handle.net/2142/112750
Sponsor:National Endowment for the Humanities (#HK-50176-14)
Date Available in IDEALS:2021-11-16


This item appears in the following Collection(s)

Item Statistics