Files in this item

FilesDescriptionFormat

application/pdf

application/pdfPAMPARI-THESIS-2018.pdf (647kB)
(no description provided)PDF

Description

Title:emrQA: A large corpus for question answering on electronic medical records
Author(s):Pampari, Anusri
Advisor(s):Peng, Jian
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Electronic Medical Records, Question Answering, Logical Forms, Semantic Parsing, Dataset Generation, Closed Domain, i2b2
Abstract:We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets. The resulting corpus (emrQA) has 1 million question-logical form and 400,000+ question-answer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping.
Issue Date:2018-12-11
Type:Thesis
URI:http://hdl.handle.net/2142/102500
Rights Information:Accepted at Conference on Empirical Methods in Natural Language Processing (EMNLP) 2018
Date Available in IDEALS:2019-02-06
Date Deposited:2018-12


This item appears in the following Collection(s)

Item Statistics