Barbara Buttenfield
University of Colorado
babs@whitney.colorado.edu
Robert Sandusky
University of Illinois at Urbana-Champaign
sandusky@alexia.lis.uiuc.edu
This session is intended to be an exploration of a set of issues that haven't, for the most part, gotten a lot of play in the DL research agenda. As far as I know, few answers, even provisional ones, are available for many of these issues. So, I hope that this will be an opportunity for us to explore and raise questions. Also, by sharing experiences, we might be able to get a sense of where our own projects are in terms of some of these management issues, and talk about what has been done regarding DL management, and what remains to be done. Finally, we can consider the future, and explore the ways in which these management issues fit into our research and design plans.
This session will begin with me taking about fifteen minutes to make some remarks to help frame the discussion that will follow. I hope that this is just a jumping-off point, and that, during our discussion, some of what I say will be challenged, and much will be added.
Before I go further, I feel I need to tell you a little about my background. I worked for a long time in an organization that specialized in designing, building, and managing large scale distributed information systems. That experience is an important influence on what I'm going to say this morning. I don't, however, have corresponding experience in managing libraries, so I'll rely upon you to point out any misconceptions I might express. I am looking for a bridge between my past experience and my current work associated with digital libraries.
My view of these issues is from the 'outside - in'. That is, I'm thinking about the DL as something to be managed by people and institutions. Also, as the subtitle "Where are we headed" suggests, these remarks are biased toward thinking about what it might mean to manage a DL that is running in production as opposed to thinking about how to manage the implementation of or transition to a DL. My remarks will not be explicit concerning what anyone should do to manage a DL. The intent is to raise a wide range of issues that might be relevant to particular situations of use.
The fundamental issue concerns, I think, the sustainability of the DL's we are now designing, building, and evaluating. Who is considering the long-term viability of these systems? Is this a case where systems are being built and then thrown over the wall only to be forgotten by their designers?
How is success defined for a DL project? We might define success as being the development of technical innovations or the use of untried technology; or as high levels of use and satisfaction among users, or through the measurement of outcomes during the project period. A different way to judge the various current projects is to look at how they fare a couple of years from now, after the original designers, developers, and researchers have moved on to new projects, and when these 'cutting edge' DL's become tomorrow's legacy systems.
What needs to be managed in a DL? First, we need to think about what a DL is. I subscribe to the view that DL's consist of more than digital items. This is enormously important especially when you are considering the integration of DL's into existing libraries with their own legacy systems like books, catalogs, indexes, and other online information systems.
There are at least five senses in which we use the term DL: a collection of materials, a set of services, an institution, a set of technologies, and as a place [Levy and Marshall, 1994]. Each of these implies some degree of management.
In terms of a collection, there are files, images, documents, bibliographic records, metadata, and so on. If this stuff is all in the collection, what might we need to manage?
I'll give an example. The Illinois DL consists of multiple repositories of digitized scientific journals. At a coarse level, you might have individual repositories or databases that need to be managed. At a finer level you might have volumes or issues of a particular journal. The next finest level might be individual journal articles. In the Illinois DL, each journal article is composed of multiple individual files, so management could be done at the file level. What level of granularity is appropriate when you consider what it means to manage this collection? It's conceivable that a DL management organization might be concerned with multiple levels of granularity depending upon role and context of use: users, like the scientists who read these journals, might focus on articles a lot; librarians might tend to focus on journal issues and volumes, titles and subscriptions; and the system administrators might tend to focus on files. The focus of a particular person could change when, for example, a librarian helps a user find a particular article.
A DL consists of services, because we provide access and add value to the collection through processes of selection, indexing, and organization. We probably have communities of users giving us indications of what they would like to see in terms of the collection and services. So, we might take regular measurements of various sorts in order to understand if our DL is providing the levels of service users expect or may have even contracted for.
In large scale DL's there will be a web of institutions involved in supporting the DL including the communities of users, the libraries, the publishers, the owners of information, funding agencies, and the vendors. These relationships are typically contractual, and therefore significant effort is needed to keep these relationships on an even keel. The hosting institution, a university library, for example, will work within this community of institutions to define policies concerning services, collections, and so on. Once policies are in place, institutions would define their priorities; that is, what matters most to them. This in turn would affect the DL management policies. Some institutions might give priority to collection size and scope; others to the reliability and availability of the DL.
DL's are technologically intensive propositions. Hardware, software, and services like network capacity must be lashed together and made to interoperate. People need to be hired, trained, and coordinated to do all of this.
In terms of place, we have the hosting institution itself, the workplaces from which the DL is used, including the libraries, publishers, and information providers who interact with the DL. Work in support of collaboration in DL's, like Mike Twidale's [Twidale, Nichols, and Paice, 1996], suggests that virtual spaces will also be an important part of the DL.
What do I mean "we"? I'm referring to who would be managing a DL. Libraries and librarians seem to be heavily involved in many cases, but so are others like the owners and operators of repositories, and the communities of users. In a corporate setting, it might be the MIS division, or the corporate information center that's responsible for DL management. I would expect to see a lot of variety in how particular DL's are managed in different settings.
There are some issues that are getting needed attention. Work on preservation and authentication of digital items is one area [Graham, 1993]. Work on interaction with users is another (for example, these Allerton Institutes), and DL economics and charging for use is a third [Sirbu and Tygar, 1995].
Ross Atkinson's recent discussion of the application of basic principles of librarianship to the digital world is an important piece [Atkinson, 1996]. He introduces the notion of the control zone: that is, libraries select items for inclusion in the control zone. When items are selected for this zone, they are given special treatment, for example by enhancing access and guaranteeing preservation and authentication of content. He also talks about the importance of coordinating various digital collections so that they appear to be a single virtual collection.
For the next few minutes I'm going to talk about an existing framework for managing complex distributed information systems that might have some utility in DL management. This framework has grown out of work associated with the ISO's Open Systems Interconnection model for communications systems. This work, originally focused on network management, is being extended to the management of distributed computing systems.
I haven't seen this framework applied to DL's yet, but it might be useful because DL's often seem to be designed as distributed systems.
This is a framework, and not a description of a particular implementation.
The DSM framework is broken down into five functional areas. I'll talk briefly about each of these and give you a couple of examples of how these functional areas might apply to DL's.
Fault management is concerned with the propagation of all types of error conditions, or faults, to the management organization. So this function could provide the DL management organization with the real time status of the DL It might be nice to know if a certain database, repository, digital item, or file is available as part of the DL at any given moment.
Other activities, like how broken things get fixed, or how people who are having problems get help, can be included in the fault management function depending upon the needs and desires of the organization.
Configuration management is concerned with keeping control of and managing change in the environment. In the DL, these activities might include tracking which versions of an item are in the collection, which medium houses the item, or what the version / media history of the item is. This could also support a activities like migration of items from format A to format B in support of collection preservation.
Security management is concerned with activities like access control, key management for encryption, and so on. In the DL, it may be necessary to distinguish between people who are able to retrieve certain items from those who are able to modify, add, or delete items in the collection.
Accounting management is concerned with measuring usage for the purposes of charging, recovering costs, etc.
Finally, the fifth functional area, performance management is concerned both with measuring and analyzing past and current levels of use as well as providing support to capacity planning activities.
Digital libraries are allowing the library as an institution to reach out to users in what seem to be important ways. Specifically, digital libraries, largely through the nature of the technology used in their construction, relax constraints of time and place. I think that the relaxation of these constraints is potentially of great interest to library users, and therefore increasingly important to us as providers of DL's. Yesterday afternoon a couple of people here were talking about acquiring users from far beyond their home communities. All users, no matter how 'local' or 'remote' they may be, are able to get into the DL at times when the physical library is closed. If interest and use and complexity increase, it becomes important for us to have mechanisms to help us cope with these demands. These mechanisms also need to be designed into the DL's we build and be capable of being aligned so as to support any DL's policy objectives.
At this point, perhaps DL management is an example of invisible, unrecognized work.
I've also pointed to an existing framework used in the management of complex information systems, the Distributed Systems Management (DSM) framework. One practical reason that this might be valuable to us as a community is because commercial applications are available to support parts of the functional framework in network management. If DL designers were to augment existing and future DL's to interoperate with these off the shelf management systems, some of these functions might be a bit more under our control. This is a possibility.
Finally, I would urge the community to engage in additional dialog about the theory and practice of DL management. What are reasonable policies and service levels for DL's? How do existing theories about paper library management transfer to DL's? Can the DSM framework be fruitfully applied to DL's? What are the techniques we can use to put these theories of DL management into practice?
Atkinson, R. (1996). Library functions, scholaraly communication, and the foundation of the digital library: laying claim to the control zone. Library Quarterly, 66(3), 239-265.
Graham, P. S. (1993, ). Intellectual Preservation in the Electronic Environment. Paper presented at the After the electronic revolution, will you be the first to go? : Proceedings of the 1992 Association for Library Collections & Technical Services presidents program 29 June 1992, American Library Association annual conference, San Francisco, CA.
Levy, D. M., & Marshall, C. C. (1994, 1994). What color was Washington's white horse? A look at the assumptions underlying digital libraries. Paper presented at the Digital Libraries '94 - Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, College Station, Texas, USA.
Sirbu, M., & Tygar, J. D. (1995). NetBill: An Internet Commerce System Optimized for Network-Delivered Services. IEEE Personal Communications, 2(4), 34-39.
Twidale, M. B., Nichols, D. M., & Paice, C. D. (1996). Browsing is a Collaborative Process (CSEG/1/96). Lancaster, UK: Lancaster University.
Coming soon....
Last Updated: Feb. 5, 1997