Vol. 20, No. 2 • February 2000 |
• COMING
FULL CIRCLE •
Digital Preservation: Everything New Is Old Again by Andrew K. Pace |
"... nearly every library in existence is doing something with digital collections ..." |
Special thanks to Eric Morgan,
who first told me of the impending vacancy in the pages of CIL,
and who leaves some big shoes to fill. Fortunately, I can thank and consult
him as often as I like since he and I work together. And thanks to the
editors at CIL, not only for extending this opportunity to me, but
also for readily accepting what presented itself as a difficult theme for
this column—that library issues and trends are circular in nature, and
that we find ourselves in situations where past (and current) decisions
play an important role in our future development.
Now, to the Matter at Hand
This month’s theme of Archiving
and Preservation excited me as a new columnist—I concentrated in the book
arts in library school, which, naturally, led me to systems; however, the
breadth of coverage on the subject amazed me as I refreshed my research.
I want to use this opportunity to address some seemingly obvious assumptions,
present and evaluate the most common forms of digital preservation, and,
of course, raise more questions than I answer.
I will start with the assumption
that nearly every library in existence is doing something with digital
collections, but that relatively few have thought about how to preserve—both
access method and artifact—the collections they are building. Secondly,
I will not challenge the generally accepted notion that the life spans
of digital storage media, application software, and required hardware are
growing shorter with each passing Roman numeral of the Intel chip. All
those state-of-the-art machines, software packages, and compression techniques
seem old before the boxes and shrink-wrap even hit the landfill. And just
as quickly, we are faced with re-evaluating the technology that was supposed
to solve all of our problems—now there’s a circle we could have all done
without. Finally, I think we can all agree to the ultimate importance of
ensuring the longevity of special collections in general and digital collections
in particular.
Where We Stand Now
If digital libraries can
be said to have a tradition, we might call it the tradition of the “accessible
repository.” Building digital collections and designing effective access
to them have been such a focus that preserving the longevity of digital
objects and digital interfaces has too often become an afterthought. Meanwhile,
our capacity to store keeps increasing, while the longevity of the media—and
its hardware—decreases. This is reminiscent of 19th-century book production,
where a more literate populace placed a demand on publishers that was far
greater than their concern for the legacy of acidic deterioration that
they unwittingly passed onto libraries. Luckily for us, there are more
librarians and technologists concerned with “bit integrity” now than there
were wood and paper scientists (with a solution) then. When libraries take
digital projects into digital production, preservation should enter the
equation early and remain a strong consideration always.
Preservation Strategies, or ‘Raspberry
Preserves’
It seems hard to analyze
the existing digital preservation options without giving them all a big
wet raspberry. Perhaps this is why no single strategy has presented itself
as the clear front-runner. So we have to do something, but what? In most
cases, we have already selected the collections that we wanted in digital
form; we’ve done usability testing and made sure the service would scale
appropriately with its use. Without even picking a new strategy, we already
need to make the fiscal commitment (both staff and ongoing infrastructure).
How can we be sure that the preservation strategy we choose will work,
and will we need to start over again when we’re done and another option
has presented itself? I am not going to attempt to answer all of these
questions, but I will present the five major digital preservation strategies.
I will also add a parallel strategy that some fear is too often overlooked,
and I will try to find some common ground on which we can all stand.
Refreshing. The first—and probably most common—alternative is refreshing. Think of the verb, not the adjective, lest we think this first option is new and insightful. Refreshing involves transferring digital materials to a new medium, for instance, changing from 5 1/4-inch floppies to CD-ROM, or from CD-ROM to DVD.
Migration. The second common activity is migrating to a new format. If you were at my university right now, you might find yourself somewhat adept at saving that WordPerfect 7 document as a new Word 97 file.
Both of the examples I’ve given so far are somewhat simplistic. Let’s imagine a more creative one. What would you do if one of your Mac OS9 FoxPro users wanted to review the historical data that your predecessor created in dBASE III on an IBM286 which is “archived” on a 5 1/4-inch double-sided floppy? The common answer: “How badly do you really need that data?” This case would require a great deal of refreshing and migration to ensure that the request could be met again in the future. This is very labor intensive, but hardly unique to our profession. Recall the ongoing efforts at Preservation Microfilming—resources being migrated to a new format, which is itself continuously refreshed as the technology improves. We stand to learn a lot from the successes and failures of those efforts.
Refreshing and migration raise another important preservation issue. Libraries must consider whether to treat digital materials as artifacts or simply as intellectual content. If I save my column in ASCII text (better make that Unicode!) on a good floppy, and faithfully refresh that medium as time goes by, I have preserved the intellectual content. This model fits my personal preservation and archival mission. Now, the equally faithful CIL editors would undoubtedly have a different mission. Their archival copy might preserve layout, graphics, varied type fonts, and (although I can’t possibly imagine why) edits. Ironically, when CIL and I sit down in our separate worlds to consider access to this content, we might create identical catalog records with identical metadata descriptors—but more on that problem later.
Technology Preservation. This option can only be described as untenable. I still keep a Commodore 64 somewhere in my parents’ basement in the faint hope that one day I may just need to resurrect that paper that I wrote with WordStar back in 1985. You might argue that this has always been the plan for other types of collections, like the Library of Congress phonographs, or any of the thousands of nationwide collections containing Kodak slides. Preserving record players and slide projectors, however, is on a completely different scale than digital technology: The hardware and software for digital media changes so rapidly that it would be impossible to keep an up-to-date technology museum.
Not only is it impossible to keep all of that hardware around, but there are several nuances to technology preservation that one might not think of at first. Can any of us truly remember what our first Web page really looked like? Maybe it was designed for Lynx or Netscape 1.0. In 1995, I created Web pages with low bandwidth as a given; Webmasters broke up pages into sometimes ridiculously small pieces; images were kept at a bare minimum. In the (relative) high-speed 21st century, that context is completely lost. It’s hard to imagine someone saying, “Let me try this on a 386 at 2400bps using Netscape 2.0, so that I can get a feel for how this Web page was meant to work,” but it would prove equally difficult to ever re-create that experience. The Web page, as an archival artifact, ceases to exist almost as soon as it is created.
Digital Archeology. I borrow this wonderful phrase from Oya Reiger at Cornell University (look for her chapter in Moving Theory into Practice due out this spring).1 It seems only slightly more justifiable than digital preservation. I think of digital archeology as the solution that IT gurus and purse-string holders like best; the former are banking on this solution, and the latter like that it doesn’t come out of this year’s budget. “Just get it in a digital format, and we promise that you will always have access to it.” It sounds a lot like what some IT experts were telling us when libraries began exploring digital technology. Remember the days when converting to digital collections was the answer to preservation? You might think of this option as the logical extension to refreshing, migration, and technology preservation; digital archeology is the gamble upon which all of these are based.
Emulation. Even if this option is not the best available, it seems, at present, the most intriguing. Emulation involves retaining information about how a digital collection was created and accessed so that future access can be accurately and faithfully reproduced. Jeff Rothenberg—probably the leading proponent of this option—writes, “For digital documents, retaining an original may not mean retaining the original medium ... but it should mean retaining the functionality, look, and feel of the original document.” 2 Using my earlier example, imagine a piece of middleware that might emulate Netscape 2’s rendering of HTML 1.0 over a 2400-baud modem. This option seems most suitable for documents that are “born digital,” but might still prove problematic for the representation of the original that we are digitizing. This brings me to the “forgotten” option of preservation.
Preservation Through
Redundancy. This method was “introduced” in the era of digitization
by Paul Conway of Yale University. Let’s not forget that the original documents
that we digitize often deserve some form of preservation in addition to
digitization. This is where the traditional preservation units must reach
middle ground with the digital libraries concerned with access. When does
a new format begin to stand for the original? This is the question that
preservation microfilm experts are still debating and the one that digital
librarians and digital preservationists should be asking themselves. Conway
writes much more knowledgeably about the comparisons of digital preservation
with traditional preservation, so I will let his work stand for mine in
that regard.3 Conway and others argue
that the core elements of preservation should be carried forward through
the digital age. The circle continues.
Common Ground to Stand On
It’s tough to get through
a conversation about digital activity in libraries these days without mentioning
the 1990s buzzword “metadata.” I was going to try, but perhaps I can make
up for it by digressing from subject descriptors, search engines, and Dublin
Core. The common ground on which all digital preservation practices will
find themselves is good administrative metadata. Think of traditional metadata
as that which describes the intellectual content, the plain text version
of the document. Administrative metadata is what is needed to preserve
the strategies outlined above. Personally, I remain skeptical about the
short-term practicality and long-term viability of intensive metadata description;
I feel strongly, however, about creating the metadata content that describes
what Rothenberg calls the “functionality, look, and feel” of digital materials.
Administrative metadata might describe everything from the hardware and
software with which the document was created to structural elements like
file size, image resolution, provenance, and data quality.
You Should Know Your Vendors Like Yourself
Let’s also not forget that
most of our libraries buy and lease more digital collections than they
create themselves. Library vendors provide another common bond, and those
who want to maintain the commercial viability of their data recognize preservation
needs, too. Libraries should make sure that current and potential vendors’
strategies match their own. Will your vendors provide archived digital
copies upon termination of license agreements, and at what cost? Is “continued
online access” a viable substitute for locally archived material? These
are difficult questions when what we really want is perpetual access to
what we have paid for, in a refreshed and updated manner—forever.
Have we come full circle
on digital technology and the digital materials we are trying to preserve?
The circle’s beauty is that it has no beginning or end, just points along
its curve, ahead and behind. I like to think of this circle as a
sort of job security. Sometimes the questions ring a bell, sometimes the
answers do. The circle might go on forever, but the questions and answers
keep getting better.
Andrew K. Pace is assistant
head, systems at North Carolina State University Libraries in Raleigh.
He acts as the primary liaison between the Systems Department and the Department
for Digital Library Initiatives. His e-mail address is andrew_pace@ncsu.edu.
References
1. Kenney, Anne R., and
Reiger, Oya Y. Moving Theory into Practice: Digital Imaging for Libraries
and Archives. (Mountain View, CA, Research Libraries Group, spring
2000).
2. Rothenberg, Jeff. Avoiding
Technological Quicksand: Finding a Viable Technical Foundation for Digital
Preservation: a Report to the Council on Library and Information Resources.
(Washington, DC, 1999).
3. Conway, Paul. Preservation
in the Digital World. Resources (Washington, DC, 1996).
• Table of Contents | • Computers In Libraries Home Page |