|Abstract:||Government information publishing on the web encounters differing expectations concerning the permanence of documents. Being only about a decade removed from the introduction of the web, informal notions of web authoring and publishing persist in many circles from the earliest days. But, terms like "government documents" convey, at least to the layman, an expectation of formality, official content, and permanence. If a permanent archive of electronic documents is to be constructed, issues in not only locating the appropriate document, but also locating a specific version of that document must be dealt with. Metadata can contribute to the solution of these problems, but issues of metadata quality and consistency of metadata generation are raised.
In the Preserving Electronic Publications project, we examined the complete web-accessible electronic document inventories of the US states of Illinois and Arizona. Statistics describing the profiles of the document inventories of these states are presented herein. Further analysis was done of all markup-language documents to determine the current extent of metadata incorporation. Generally speaking, metadata authoring at the individual state agency level, as expected, was verified to be "just beginning".
In addition to embedded HTML META tags, other useful descriptive information is often available in the header of the HTTP messages sent from the web server to a requesting client program (e.g., a user's web browser, or our "web spider" acquisition system). Classification software based on an analysis of included keywords and phrases could automatically contribute some metadata. And, extrapolation of the subject(s) of a poorly-tagged document from the subjects of its hypertext neighbors is another stop-gap measure. The author believes the use of automated methods to generate, infer, or extract such metadata to be the only economical alternative to the manual retrofitting of metadata into the very extensive document inventories of government agencies and other large organizations.