Files in this item



audio/mpegblake.mp3 (6MB)
Audio filemp3 audio


application/ (302kB)
PowerPoint Presentation/SlidesMicrosoft PowerPoint


Title:Beyond Genes, Proteins, and Abstracts: Identifying Scientific Claims from Full-Text Biomedical Articles
Author(s):Blake, Catherine
Subject(s):natural language processing methods, annotated collections
Abstract:Massive increases in electronically available text have spurred a variety of natural language processing methods to automatically identify relationships from text; however, existing annotated collections comprise only bioinformatics (gene-protein) or clinical informatics (treatment-disease) relationships. This paper introduces the Claim Framework that reflects how authors across biomedical spectrum communicate findings in empirical studies. The Framework captures different levels of evidence by differentiating between explicit and implicit claims, and by capturing underspecified claims such as correlations, comparisons, and observations. The results from twenty-nine full-text articles show that authors report fewer than 7.84% of scientific claims in an abstract, thus revealing the urgent need for text mining systems to consider the full text of an article rather than just the abstract. The results also show that authors typically report explicit claims (77.12%) rather than an observations (9.23%), correlations (5.39%), comparisons (5.11%) or implicit claims (2.7%). Informed by the initial manual annotations, we introduce an automated approach that uses syntax and semantics to identify explicit claims automatically and measure the degree to which each feature contributes to the overall precision and recall. Results show that a combination of semantics and syntax is required to achieve the best system performance.
Issue Date:2010
Publisher:Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Genre:Presentation / Lecture / Speech
Date Available in IDEALS:2012-03-12

This item appears in the following Collection(s)

Item Statistics