Files in this item



application/pdfDFDL_Lessons_Learned.pdf (2MB)
(no description provided)PDF


Title:Data Format Description Language: Lessons Learned, Concepts and Experience
Author(s):McGrath, Robert E.
Subject(s):Data Format Description Langauge
XML Schema
Open Grid Forum Standard
Abstract:For the past 6 years as part of the “Innovative Systems and Software: Applications to NARA Research Problems” project, NCSA has contributed to the development of the Open Grid Forum (OGF) standard format description language, the Data Format Description Language (DFDL). A DFDL parser is sufficient to support interpretation of arbitrary binary or ASCII formatted files in terms of well-defined logical models.

The Data Format Description Language emerged from a variety of unrelated projects and products, which had various goals and approaches. The goal of the OGF DFDL-WG is to build on previous experience to create a consensus standard that can replace the disparate related efforts. In 2011, the DFDL specification was accepted as a “Proposed Recommendation” of the Open Grid Forum.

The DFDL is a critical new technology for many important use cases, including: • Access and manipulation of non-XML data, such as data from sensors or simulations • Interoperation of data from many independent sources • Preservation of access to data for long periods of time • Construction and access to “virtual datasets” from many sources.

This capability is especially interesting for archives that need to preserve access to data for long periods of time.

Beyond maintaining the accessibility of the raw ‘1’s and ‘0’s of digital data, preservation and interoperation requires maintaining an ability to interpret the data as meaningful structures, relationships, and visual representations. NCSA has investigated concepts for a general descriptive method for accessing data in arbitrary file formats and providing interpreted information of it in XML and RDF representations, supporting discovery and long-term preservation of content.

This technology has broad application across the curation and preservation processes, and more broadly in e-Science in general, and the DFDL has been identified by the US National Archives and Record Administration (NARA) as a priority in the area of Human Computer Interaction and Information Management.

This project has included contributions to the development of the DFDL standard, test implementations of the concepts, and explorations of semantic extensions for DFDL. This document summarizes the activities and presents some lessons learned in the course of this project.

Issue Date:2011-09
Genre:Technical Report
Publication Status:unpublished
Peer Reviewed:not peer reviewed
Sponsor:National Science Foundation Cooperative Agreement NSF OCI 05-25308
Cooperative Support Agreements NSF OCI 04-38712 and NSF OCI 05-04064 by the National Archives and Records Administration
Date Available in IDEALS:2011-10-06

This item appears in the following Collection(s)

  • Illinois Research and Scholarship
    This is the default collection for all research and scholarship developed by faculty, staff, or students at the University of Illinois at Urbana-Champaign

Item Statistics