Files in this item



application/pdfHBowden_iConf2009_poster.doc-1.pdf (168kB)
(no description provided)PDF


Title:Assessing Need for an Automated File Format Obsolescence Warning System for Digital Collections
Author(s):Bowden, Heather Louise Mae
Subject(s):digital curation
Abstract:Anecdotal evidence reported in literature, and personal discussions with managers of digital archives suggests that one of the greatest hindrances to the successful preservation of resources in digital archives is the high level of repeatable activities that are required to be performed in order to monitor their digital collections’ viability over time [1] [2] [3]. Equally troublesome is the rate at which digital file formats become “obsolete,” or not readable by current computer software and/or hardware. It is not currently clear which tools should be developed to best ameliorate these issues, or the severity of the actual needs for these types of tools in the digital archives environment. At present there is no fully functioning system which can detect and notify digital archives managers of impending file format obsolescence. In order for preservation systems to evolve and grow in step with the changing technological landscape, they need to find a way to dynamically monitor and react, if necessary, to the changes as they occur. Static systems with rigid controls of data flow have no ability to monitor, adapt, and grow as the sands of technology shift. ‘Community watch and participation’ is a key component of the DCC Curation Lifecycle [5], but has yet to be formally applied to functions in developing preservation systems. In order to begin designing tools which will aid in the management and preservation of digital collections, the first step is to engage with the community of digital collection managers and learn directly from them about their needs in this arena. Using the principles of user centered design, the following study was conducted as a first step in the iterative design process to create an automated file format obsolescence warning system. This is part of the initial design phase of “collecting critical information about users,” [4] which will lead to the iterative cycle of design, test and measure, and redesign. This study seeks to answer three research questions: 1) What types of file formats are currently being managed in digital collections, 2) What methods are digital collection managers currently employing to sustain their collection over time, and 3) What types of tools (automated or otherwise) can help digital collection managers in sustaining their collection over time? The information collected from this study will be used to inform the development of a file format obsolescence warning system which will make use of collective intelligence and community participation in order to dynamically monitor and report on the changes in file format viability. Data was collected for this study through semi-structured phone interviews with managers of digital collections; and have been qualitatively analyzed using grid analysis techniques in order to assess patterns, consensus, and outlier information about their collections, preservation practices, and needs for tools in managing file format obsolescence. A total of nine participants took part in this study and are all professionals who are responsible for the management of a digital collection. They all answered questions about the digital collections they managed. These questions were broken down into six broad categories: 1. Which file formats are you currently managing?, 2. For how long are you intending to or required to preserve the digital items in your collection?, 3. What aspects of your digital collection are most important to preserve?, 4. What measures do you take or activities do you currently perform to manage file format obsolescence in your collections?, 5. Would an automated file format obsolescence notification system be helpful?, and 6. What other tools could help you? The following generalized answers to these questions are being applied to further research and tool development. The range of file formats managed across collections varied widely, where the most common file formats (TIFF and PDF), were found in almost all of the collections. The respondents were most concerned about preserving the more obscure file formats such as DBASE and Déjà Vu. Every collection manager reported that the items in their collections were expected to be preserved indefinitely. Each digital collection specified different properties of the digital objects which needed to be preserved. Even in the same collection, there were different properties which were important to preserve in different contexts. There was a wide range of digital preservation activities being performed across the collections, from “nothing ” to “educate the data producers” and the implementation of a migration on ingest program. Where every participant responded affirmatively that they could benefit from having an automatic file format obsolescence notification system, they all had different visions of how i t could be implemented in their workflow. Other tools which were reported to be desired were automatic validation & authenticity checking functions and automatic migration functions. Implications of the study results point to the need to develop an automatic file format obsolescence/endangerment notification system which can assess a wide range of file formats for an indefinite period of time. The system must also allow for granular user controls which can be implemented not only at the institutional level, but also at a use case levels. Most importantly, any such system must be able to evolve and change in step with the technological landscape it is monitoring. A prototype of a system which will address these needs will begin to be developed in the summer of 2010. A proposed, high-level conceptualization of this system is shown in Figure 1. In this model, a technology watch component is comprised of a collective intelligence unit and sorting and analyzing algorithms which work together to create the output of a list of file formats and their endangerment warning levels. The collective intelligence component is comprised of data pulled or “crawled” from websites as well as data informed by a combination of loose social networks and tight, predetermined social networks. Collective intelligence has been generally defined as, “when a group of individuals collaborate or compete with each other, intelligence or behavior that otherwise didn’t exist suddenly emerges.” [6] When referring to technology, it has also been said to be the “combining of behavior, preferences, or ideas of a group of people who create novel insights.” [7] The sorting and analyzing algorithms are based on the CUSUM algorithms used by the North Carolina Disease Event Tracking and Epidemiologic Collection Tool (NC DETECT) system, which are used to analyze data collected from several sources online in order to detect outbreaks of infectious diseases. [8] The output of these two components is the dynamically generated and updated list of file formats and their endangerment ratings. Input from the pre-determined social networks is used to refine the list to the specific needs of the group and input from the individual digital collection managers is used to refine the list further for the needs of their institution and individual use cases. The individual digital collection manager may also inform the system less directly by sharing their knowledge and experience via any channel on the World Wide Web. By using collective intelligence methods and models developed for other early warning systems, it will be possible to provide more timely and relevant file format endangerment warnings to digital collection managers. This system design allows for the inclusion of all file formats and also allows for specifications to be changed on an individual and context specific level. The information collected in the participant interviews shows that these capabilities are relevant to their needs and important in the design of a file format endangerment warning system, and so they have been incorporated into the first design stage of this project. Further research and user testing will be conducted as test systems are implemented.
Issue Date:2010-02-03
Genre:Conference Poster
Date Available in IDEALS:2010-03-01

This item appears in the following Collection(s)

Item Statistics