Title: | emrQA: A large corpus for question answering on electronic medical records |
Author(s): | Pampari, Anusri |
Advisor(s): | Peng, Jian |
Department / Program: | Computer Science |
Discipline: | Computer Science |
Degree Granting Institution: | University of Illinois at Urbana-Champaign |
Degree: | M.S. |
Genre: | Thesis |
Subject(s): | Electronic Medical Records, Question Answering, Logical Forms, Semantic Parsing, Dataset Generation, Closed Domain, i2b2 |
Abstract: | We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets. The resulting corpus (emrQA) has 1 million question-logical form and 400,000+ question-answer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping. |
Issue Date: | 2018-12-11 |
Type: | Text |
URI: | http://hdl.handle.net/2142/102500 |
Rights Information: | Accepted at Conference on Empirical Methods in Natural Language Processing (EMNLP) 2018 |
Date Available in IDEALS: | 2019-02-06 |
Date Deposited: | 2018-12 |