Files in this item

FilesDescriptionFormat

application/pdf

application/pdfKEHOE-DISSERTATION-2019.pdf (4MB)
(no description provided)PDF

Description

Title:Predicting controlled vocabulary based on text and citations: Case studies in medical subject headings in MEDLINE and patents
Author(s):Kehoe, Adam K.
Director of Research:Torvik, Vetle I
Doctoral Committee Chair(s):Torvik, Vetle I
Doctoral Committee Member(s):Smalheiser, Neil R; Dubin, David S; Ludäscher, Bertram; Downie, John S
Department / Program:Information Sciences
Discipline:Library & Information Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Controlled vocabulary
Medical Subject Headings
Controlled Vocabulary Prediction
Abstract:This dissertation makes three contributions in the area of controlled vocabulary prediction of Medical Subject Headings. The first contribution is a new partial matching measure based on distributional semantics. The second contribution is a probabilistic model based on text similarity and citations. The third contribution is a case study of cross-domain vocabulary prediction in US Patents. Medical subject headings (MeSH) are an important life sciences controlled vocabulary. They are an ideal ground to study controlled vocabulary prediction due to their complexity, hierarchical nature, and practical significance. The dissertation begins with an updated analysis of human indexing consistency in MEDLINE. This study demonstrates the need for partial matching measures to account for indexing variability. Here, I develop four measures combining the MeSH hierarchy and contextual similarity. These measures provide several new tools for evaluating and diagnosing controlled vocabulary models. Next, a generalized predictive model is introduced. This model uses citations and abstract similarity as inputs to a hybrid KNN classifier. Citations and abstracts are found to be complimentary in that they reliably produce unique and relevant candidate terms. Finally, the predictive model is applied to a corpus of approximately 65,000 biomedical US patents. This case study explores differences in the vocabulary of MEDLINE and patents, as well as the prospect for MeSH prediction to open new scholarly opportunities in economics and health policy research.
Issue Date:2019-07-09
Type:Text
URI:http://hdl.handle.net/2142/105645
Rights Information:Copyright 2019 Adam Kehoe
Date Available in IDEALS:2019-11-26
Date Deposited:2019-08


This item appears in the following Collection(s)

Item Statistics