THE SYSTEMS LIBRARIAN
Now That It's All Digital, Where Do
I Put It? Exploring Data Storage Technologies
by Marshall Breeding
The thing about cutting-edge technology is that it dulls
so quickly. The hardware, software, and technology concepts
that today seem blazingly fast, superabundant in capacity,
or transformative in their effects will in just a few
short years be considered mediocre or passé. Many
cutting-edge technologies fizzle out and slip into obscurity
once the hype dies. Yet, it's important to follow the
latest in technology and to ride as close to the leading
edge as we dare—or at least as close as we can afford.
It's also good to take note of the technologies that have
passed their prime. Technology that's actually practical
to use lies somewhere in between the cutting edge and
the obsolete.
The area of technology that I struggle with the most
is data storage. Like most libraries, we're involved
in projects to digitize portions of our collections.
Besides those projects, everyday computing—both
at home and at work—involves the need to constantly
store, transfer, and back up data. My involvement with
the Vanderbilt Television News Archive causes me to
think about all the available possibilities for storing
large amounts of data. We are currently working on a
project to digitize our videotape collection and finding
ways to store, archive, and move the data has proved
to be an enormous challenge. We're producing almost
3.5 terabytes of content a month, and at the end of
the project we'll have created over 130 terabytes. While,
in general, the capacity of storage devices increases
each year as the cost per megabyte goes down, at the
scale of this project, current capacities are inadequate
and costs are too high.
Yet not all storage needs are large-scale. There are
times when the need centers on small and portable. Whether
you want to store a megabyte, a gigabyte, or a terabyte,
there are a lot of great technologies available today.
I'll cover some here.
The Diskette Is Dead
First of all, let's recognize that the days of floppy
disks or diskettes have passed. The diskette is dead.
Once standard equipment on computers, most new models
only offer these disk drives as added options. The 1.44
MB offered by the latest generation of 3.5-inch diskettes
just doesn't hold enough data to be useful in today's
world. Therefore, it is important to transfer information
that you have on diskettes or floppy disks before it
becomes hard to find drives that can read them. One
of the realities of data storage in the digital world
is the need to constantly refresh and transfer content
to current technologies. The problem especially applies
to librarians who may have items in their library collections
that include content delivered on now-obsolete media,
such as books with supplementary materials supplied
on diskettes. It might be a good idea to keep a computer
equipped with a diskette drive available until you are
positive that everything in your library's collection
has been transferred.
Optical Storage Solutions
Optical discs have taken over as the preferred media
for portable storage. CD-R and CD-RW are convenient
and low-cost ways to store and transport data and music.
Blank CD-R discs sell for mere pennies and offer up
to 700 MB of storage. Almost any new computer comes
equipped with a CD-R drive and the software for burning
data or music onto discs. While no longer on the cutting
edge, optical discs are solid, practical technology,
and they are far from extinction.
While CD-R is holding its own, recordable DVD is a
rising star. The storage capacity of CDs (while fine
for data or audio files) falls short of what's needed
for video. While commercially pressed DVDs have been
around for quite some time, the hardware and software
for burning your own has been a bit pricey until recently.
The cost of recordable DVDs has come down enough in
the last year to make them well-suited, not just for
video, but also for a variety of applications that deal
with large amounts of data. Three years ago when I was
planning and budgeting a video project, a DVD-R drive
sold for just under $1,000 and the blank discs cost
as much as $20 each. Now, drives are under $200 and
a blank disc costs well below $1. At 15 to 25 cents
per GB, DVD-R stores data relatively inexpensively and
while they function well as backup storage, given the
fragility of the discs, they aren't reliable enough
for long-term archiving.
One of the maddening realities in the realm of DVD
is the different media types and drives. Options include
DVD-R and DVD-RW (supported by a group of manufacturers
called the DVDForum) and DVD+R and DVD+RW (supported
by the DVD+RW Alliance). While nuances of advantages
exist between the two camps, the lack of a single standard
makes selecting equipment and media more complicated
than necessary. Fortunately, a large portion of the
DVD drives support all the different media options.
While DVD-R has shown dramatic improvements over its
short lifecycle, in terms of cost versus capacity when
dealing with large-scale video projects, I still find
the options unsatisfactory. The next generation of optical
storage, based on blue lasers rather than the red ones
used today, promises some improvement. Dubbed Blu-ray
(http://www.blu-ray.com), these discs use a dual-layer
approach to hold as much as 50 GB. Don't expect to see
this technology on the shelves until the end of 2005.
Unfortunately, by then, 50 GB of storage may not seem
so impressive.
These days, we have lots of options for computer storage
based on magnetic drives. Individual workstations now
come with hard drives from 80 to 300 GB. While that
seems generous (now that many have graduated from listening
to music in MP3 format to watching videos in MPEG),
even today's largest drives will quickly become cramped.
The vast majority of users will find that the amount
of storage offered on most current computer models is
quite sufficient for their needs. When more capacity
is needed, the EIDE architecture used by most desktop
computers makes it easy to install an additional drive.
Having half a terabyte of storage on a PC is quite feasible.
The ability for relatively low-cost PCs to offer large-scale
storage meets my standards for great technology.
Storing Outside the Box
Large-capacity external drives that connect through
USB or FireWire rate high on my list of useful technologies.
These come in just about any size. I've seen them as
small as 40 GB and as large as 1.6 TB. Remember the
days of "sneakernet," when the only way to
move files from one computer to another was by copying
them to a diskette and walking them over to the recipient?
Today, a large external drive can be extremely handy
when you need to transfer very large data sets. Suppose
you need to move 500 GB worth of data from one organization
to another located across the country. Using a DSL connection
at 1.5 MB/sec., the transfer would take more than 31.6
days, nonstop. It would definitely be faster to copy
the data to an external drive and ship it. These drives
can also serve as a fast and convenient way to make
backup copies of data. I have a database of image files
that takes about 30 DVDs to back up. Making an extra
copy on a 250 GB external USB drive is fast and convenient,
though a bit more expensive. (Expect costs of about
$1 per gigabyte.)
Network-Based Storage
Network-based storage options available today include
server attached storage, network attached storage (NAS),
and storage area networks (SAN). Selecting and building
a large-scale storage system is a complex issue and
may ultimately involve a combination of these technologies.
I won't go into the details here, but for my latest
projects I have steered away from SAN technologies.
My previous experience with them led me to the opinion
that unless there are very specific needs that require
high manageability, ultra-high availability, and extremely
large capacity, then the cost and complexity of a SAN
may not be warranted. I'm currently using a cluster
of servers with simple-attached storage to provide about
14 terabytes of storage for our digital video system.
You Can Take It with You
You don't always need a lot of storage; sometimes you
just need a modest amount, but you want it to be small
and portable. USB-attached flash drives fill this niche
superbly. With capacities from a few megabytes up to
a couple of gigabytes, these devices provide a convenient
way to carry your presentation to a conference, to move
files between home and work, or to do any number of
chores. And they're incredibly easy to use—just
plug the tiny device into the USB port of any relatively
current computer and it almost instantly shows up as
an additional drive.
Choose It and Use It Wisely
Data storage is not a one-size-fits-all proposition.
Every project or problem brings a different focus of
concern: capacity, portability, long-term archiving,
performance, economy, reliability, or flexibility. While
data storage remains the one technology problem that
I wrestle with the most, fortunately, solutions abound.
Keep in mind that digital data is fragile. Any given
copy of a file can be instantly destroyed through a
hardware or software failure or through human error.
Don't ever leave yourself in the situation of having
only one copy of your data—make multiple copies
and keep them in separate locations. It is important
to remember all the different technology options that
we have available for safely storing the data we create.
Marshall Breeding is the library technology officer
at Vanderbilt University in Nashville, Tenn., and a consultant,
speaker, and writer in the field of library automation.
His e-mail address is marshall.breeding@librarytechnology.org.
You can also reach him through his Web site at http://staffweb.library.vanderbilt.edu/breeding.
|