Withdraw
Loading…
Exploring the Billions and Billions of Words in the HathiTrust Corpus with Bookworm: HathiTrust+Bookworm Project Technical Report
Downie, J Stephen; Lieberman-Aiden, Erez
Loading…
Permalink
https://hdl.handle.net/2142/112750
Description
- Title
- Exploring the Billions and Billions of Words in the HathiTrust Corpus with Bookworm: HathiTrust+Bookworm Project Technical Report
- Author(s)
- Downie, J Stephen
- Lieberman-Aiden, Erez
- Contributor(s)
- Organisciak, Peter
- Schmidt, Benjamin
- Bhattacharyya, Sayan
- Jett, Jacob
- Issue Date
- 2017-12-07
- Keyword(s)
- HathiTrust
- Natural Language Processing
- Metadata
- Data Visualization
- Date of Ingest
- 2021-11-16T15:10:35Z
- Abstract
- Bookworm is a tool that visualizes language usage trends at large scales, designed to be powerful but simple. It allows multi-faceted slicing and dicing of the data against a set of content-based and metadata-based features. Our recent work with the HathiTrust+Bookworm (HT+BW) project has focused on improving Bookworm's ability to scale for large collections, while supporting an implementation of Bookworm over one of the largest digital book collections: the HathiTrust Digital Library. The implementation allows scholars to explore the full HathiTrust corpus — but with the control to compare on the basis of such features as subject classification, place of publication, genre, and language. It also provides tools for improved future implementations of Bookworm over non-HathiTrust collections.
- Publisher
- Center for Informatics Research in Science and Scholarship, School of Information Sciences, University of Illinois at Urbana-Champaign
- Type of Resource
- text
- Genre of Resource
- technical report
- Language
- en
- Permalink
- http://hdl.handle.net/2142/112750
- Sponsor(s)/Grant Number(s)
- National Endowment for the Humanities (#HK-50176-14)
Owning Collections
Research Projects - CIRSS PRIMARY
Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…