Files in this item

FilesDescriptionFormat

application/pdf

application/pdfLL-Zhang-Poster.pdf (782kB)
(no description provided)PDF

Description

Title:Towards more transparent, reproducible, and reusable data cleaning with OpenRefine
Author(s):Li, Lan; Ludäscher, Bertram; Zhang, Qian
Subject(s):OpenRefine
Data cleaning
Provenance
Transparency
Reproducibility
Reusability
Abstract:We study provenance features of OpenRefine, a popular data cleaning tool. In OpenRefine, provenance is available through operation histories and recipes. The former provide users with an undo/redo capability; the latter represent histories in JSON, so recipes can be reused. The model implicit in histories and recipes exhibits both prospective and retrospective provenance features, but is incomplete in at least two ways: (i) functions resulting in mass edits, and (ii) single cell edits are not captured, thus missing important prospective and retrospective provenance information, respectively. We propose to complete the missing information by capturing names and parameters of user-invoked functions, and by exposing retrospective provenance hidden in internal project files. The feasibility of the approach is demonstrated with an early prototype.
Issue Date:2019-03-15
Publisher:iSchools
Series/Report:iConference 2019 Proceedings
Genre:Conference Poster
Type:Text
Language:English
URI:http://hdl.handle.net/2142/103330
DOI:https://doi.org/10.21900/iconf.2019.103330
Rights Information:Copyright 2019 Lan Li, Bertram Ludäscher, and Qian Zhang
Date Available in IDEALS:2019-03-22


This item appears in the following Collection(s)

Item Statistics