Files in this item



application/pdfDubnicek-Ryan_20180416_V01.pdf (286kB)
(no description provided)PDF


Title:Creating A Disability Corpus for Literary Analysis: Pilot Classification Experiments
Author(s):Dubnicek, Ryan; Underwood, Ted; Downie, J. Stephen
Subject(s):distant reading
digital humanities
disability in literature
Abstract:As literary text opens to researchers for distant reading, the computational analysis of large corpora of text for literary scholarship, problems beyond typical data science roadblocks, such as data scale and statistical significance of findings have emerged. For scholars studying character and social representation in literature, the identification of characters within the given classes of study is crucial, painstaking, and often a manual process. However, for characters with disabilities, manual identification is prohibitively difficult to undertake at scale, and especially challenging given the coded textual markers that can be used to refer to disability. There currently exists no corpus of characters in fiction with disabilities, which is the first step to at-scale computational study of this topic. This project seeks to pilot a classification process using manually assigned ground truth on a subset of volumes from the HathiTrust. Having successfully built and evaluated a Naïve Bayes classifier, we suggest full-scale deployment of a statistical classifier on a large corpus of literature in order to assemble a disability corpus. This project also covers preliminary exploratory textual analysis of characters with disabilities to yield potential research questions for further exploration.
Issue Date:2018
Series/Report:iConference 2018 Proceedings
Genre:Conference Poster
Rights Information:Copyright 2018 is held by Ryan Dubnicek, Ted Underwood, J. Stephen Downie. Copyright permissions, when appropriate, must be obtained directly from the authors.
Date Available in IDEALS:2018-07-12

This item appears in the following Collection(s)

Item Statistics