The Gutenberg-HathiTrust Parallel Corpus: A Real-World Dataset for Noise Investigation in Uncorrected OCR Texts
Jiang, Ming; Hu, Yuerong; Worthey, Glen; Dubnicek, Ryan C.; Capitanu, Boris; Kudeki, Deren; Downie, J. Stephen
Permalink
https://hdl.handle.net/2142/109695