Feature
Preserving the 'Athens of Indiana'
Through Digitization
By Bill Helling
A few weeks after I came to the
Crawfordsville District Public Library in Indiana as its first systems librarian,
I was asked to spearhead a very ambitious digitization project. Two of our
senior librarians, Mary Johnson and Judy Spencer, had initiated the project
a year earlier, in 1998, after realizing that the local history collection
was in danger: Some items were already damaged by frequent use, and some were
lost due to either misplacement or theft. These librarians saw digitizing as
an obvious way to preserve the originals. They were admittedly not able to
handle any of the technical aspects of digitization, but they did their homework
by visiting several regional libraries involved in such efforts. Though they
did not find any public library of our size to use as a model, they requested
a grant of $62,000 from our Montgomery County Community Foundation. The foundation
approved the grant in early 1999, shortly after my arrival. All they needed
at that point was to turn the grant proposal into a realityand that was
where I came in.
Checklist for Getting Started
Take an inventory of your collection. This is a good
time to fix cataloging problems, reunite items belonging together,
weed, and prioritize.
Gather any type of indexing already done, whether in
book form or on cards.
It may be ready for simple data entry.
Search for any type of electronic data that may exist
from previous indexing projects to see if this data can be imported
into your database application.
Establish authority control and indexing conventions
that can be used to clean up existing indexes and to guarantee the
consistency of future ones.
Train current indexers on proper data entry in an application
that they can manage and that you can import into a database.
Establish a list of who is indexing what in order to
avoid conflicts and to promote a sense of ownership.
Assign someone to be responsible for supervising the
data-entry process and collecting the new data from its various points
of production.
Perform clean-up duties before importing data into your
database by using the strengths of the different applications that
the data may arrive in. For example, use Excel to sort data by different
fields (columns) in order to find obvious entry errors.
Back up all electronic data on a regular basis and distribute
it to safe places. |
Surveying the Past Glory
The library serves the small town of Crawfordsville, with its 14,000 residents,
as well as the surrounding area. This town has some real historical significanceit
was an important portal in the 19th century for settlers moving between the
East and the West. It is still the home ofWabash College, founded in 1832,
and it was the residence of several now-forgotten but once well-published writers
such as Lew Wallace, of Ben Hur fame, and Maurice Thompson, who penned Alice
of Old Vincennes. Henry S. Lane, a leading figure in the founding of the
Republican Party and friend of Abraham Lincoln, lived two blocks from the present-day
library. Such was the reputation of its inhabitants and its culture that Crawfordsville
was referred to as the "Athens of Indiana"!
The library here is thus a repository for a large number of historical items,
many dating back to the early days of the county. Our original building, the
Carnegie Library, was dedicated in 1902, and today its 108,000 items fill not
only the library structure but also a remodeled car dealership now connected
to it. Our small Local History room can't hold much: We are using space on
another floor, garage space left over from the car dealership, and part of
the reference department for storage. Because of the popularity of the local
history collection, our staff struggles to answer the constant mail, e-mail,
and phone queries as well as to serve the researchers coming from all over
the country.
Gathering the Right Tools
By the time the grant was awarded, we were at the point where we had to act
to preserve our endangered heritage while making it easier to access. At the
outset, we were simply planning for in-house access of our digitized images.
We intended to scan for archival purposes any manuscript or item of historical
interest and to burn these images on CDs so that we could store the items safely
away. The archive image (a 600-dpi noncompressed TIFF image) would be suitable
for printing. We were then prepared to save the image as a much smaller JPEG
file (at 150 dpi) or Adobe PDF file for viewing on local computers. We were
also planning to invest in a simple database program to help us keep track
of the images.
We knew that we needed a special piece of equipment to scan many of the local
history items. The pages of old volumes tend to crumble, and their inflexible
spines split when opened too far. Many of the volumes are also several inches
thick and are unwieldy because of their weight or size.
The grant request specified $18,000 for an overhead scanner (for face-up
scanning), and I decided upon the Minolta PS 7000 for several reasons. First,
this grayscale scanner covers a maximum area of 17 x 23 inches at 400 dpi (and
11 x 17 or less at 600 dpi), which was ideal for our oversized items. I was
also satisfied by the way it works with Eastman Software's Imaging for Windows
(Professional Edition) in straightening curved text and lightening the gutter
areaan important consideration for scanning thick volumes. In addition,
the surface of this overhead scanner is adjustable so that it can cradle the
spine of a large work while allowing for automatic or manual focus.
As for our database software, I decided that DB/TextWorks by Inmagic would
serve us well because we could easily design custom databases around our different
data. At the time, Inmagic was also offering DB/SearchWorks, which would enable
us to share network copies of databases built with DB/TextWorks. In addition,
DB/SearchWorks would allow users to view images attached to the records in
the database.
Hardware, software, and supplies accounted for some $25,000 of the grant.
However, none of the technology purchases would matter without the staffing
to take advantage of them. The grant writers fortunately had figured in money
to hire a part-time staff person for 3 years to perform scanning duties.
Rethinking the Approach
Even the simplest plans change once a project gets going, however. In working
with Inmagic, I soon became aware of its DB/Text WebPublisher product. WebPublisher
runs on a Web server and writes to HTML the results of a DB/TextWorks searchthus
making the results viewable in any Web browser.
The chance to publish data on the Web caused me to reconsider the original
plan. If we were going to produce a screen-viewable image after the archive
image, why limit users to in-library viewing? Unfortunately, making these images
available on the Web required a massive amount of storage space that our ISP
would not allow. We also needed to install WebPublisher on the Web server and
have access to its configuration files. I soon realized that we would have
to run our own Web server if we really wanted to maximize access to our collection.
Although WebPublisher and its maintenance costs were not figured into the grant
request, our library director, Larry Hathaway, worked hard to free up the money
from an already-tight library budget. As for the Web server, we were able to
use a grant from the Library Services and Technology Act (LSTA) to purchase
a Dell PowerApp server with enough power and storage space to suit our needs
for years to come. In the meantime, I decided to keep the main library Web
pages hosted on the ISP's server because, as the only systems librarian, I
cannot possibly provide constant service for our Web gateway. However, I was
willing to manage our Local History Web server, and I can simply link to the
databases on this server from our ISP-hosted library Web site, http://www.cdpl.lib.in.us/lh.
By early 2000, the pieces were in place and we were ready to begin our preservation
effort in earnest.
Preparing the Dig Site
Before we even made any plans, we knew that we had to first scan our most
valuable "museum" pieces. These items included local business accounts from
1837 to 1839, Church of God records of 1842 to 1853, the Montgomery County
Registry of Negroes and Mulattoes of 1853,some miscellaneous records ofWilliam
Bratton (a member of the Lewis and Clark expedition who settled near here in
1822), and so on. After we were satisfied that the most important items were
digitally preserved and then safely stored, we felt able to move on to a more
systematic approach.
Unfortunately, no single person knew the true scope of our holdings. The
library was automated, but the local history collection was not entered into
the system. The old card catalog could have been written in an ancient language
for all the help it was. A volunteer of many years, Dellie Craig, had the best
understanding of the collection. We met repeatedly until we had mapped what
we possessed and where it was buried. At this point we realized that we had
so much more than just historic volumes and manuscripts to preserve as images:
We had inherited the labors of previous library staff and volunteers who had
been indexing for decades our newspapers, local magazines, and other resources.
As a result, the library had many shelves of typed indexes and drawers crammed
full of handwritten catalog cards. We would have to enter all the hard-copy
material into our DB/TextWorks databases, of course, but using these ready-made
sources was a natural starting point before digging deeper.
After some deliberation, we decided to produce two main types of databases.
One type simply points to records when an image of an item represented by the
record is unnecessary or impossible. For example, our cemetery database, which
volunteers are transcribing directly from nearly 100,000 handwritten cards,
is a reading of every headstone erected in the county up until 1981. This database
most definitely does not need to lead to an image of every headstone.
However, for some other databases, a link to an image is often possibleand
even necessary. We have a set of marriage records from 1888 to 1941 indexed
only by groom and bride names. Each record links to an image of the actual
page so that researchers can also discover parents' names, places of marriage,
occupations, and so on. The only alternative to using an image link would be
for us to transcribe everything mentioned on each marriage record!
Collecting the Artifacts
Deciding which resources to digitize can often be difficult, given that we
have so many items to choose from. But, aside from digitizing items that we
feared losing the most, we base our decisions on what researchers need to use
and tend to request.At any given moment we have library staff and volunteers
entering hard-copy data, from either an existing print index or from the original
item, into a program that they feel comfortable using, such as Microsoft Word,
Excel, or even Access. It is just not possibleor desirableto have
everyone enter data directly into our database program.
Our part-time staff person subsidized by the grant does all our scanning
for archival and display purposes, and we both enter data directly into DB/TextWorks.
We work together to pool the images and data that we have, and we always keep
several different databases going at one time. Although we have tried, it is
almost impossible to predict how long it is going to take us to dig up and
dust off our data because of so many factors, including how busy I am with
my other duties, the time the volunteers can give us, and the amount of accompanying
scanned images, if any.
Encountering Problems, Learning Valuable Lessons
Some intrepid indexers began to use word processors several years before
my arrival. Unfortunately, they considered word processing as merely a convenient
way to produce a nice print copy before discarding the electronic file! No
need to keep those pesky files around after printing, right? I quickly ended
this practice. But even the electronic data produced after my arrivaland
now being savedpresented problems. For example, those using Word did
not think about how to enter the data so that it could easily be moved into
a database. Well-meaning volunteers were indiscriminate about using numerous
tab stops, spaces, and hard returns to line up data into nice columns instead
of using just one tab mark between columns or simply using a table. I am now
closely supervising all electronic data entry in order to minimize my conversion
problems. I also know exactly what type of data-entry mistakes to look for,
and I spend more time cleaning data before I begin moving it into DB/TextWorks.
Nevertheless, I am willing to have volunteers work in their comfort zones in
any format that I can later clean up and import. We can't succeed without volunteers.
The hard copy data compiled over many years was also a source of many problems.
Some indexers created authority files but rarely updated them before putting
them aside. Nothing was carved in stone: spelling, capitalization, and anything
else that could evolve into different forms did so. These inconsistencies were
showing up all too well in digital form, so we began to place more effort into
establishing and following indexing conventions.
Our actual scanning experiences have been relatively trouble-free. We soon
learned how to adjust the room lighting near the sensitive overhead scanner
and to keep reflective objects away from it to avoid interfering with the quality
of the scan. We also know now that we can never place the scanner too close
to a solid white wall because of the reflections. Our scanning assistant has
even learned to dress in dark colors on days when she is scanning light text
so that her clothes do not interfere with the image quality!
We have also adjusted our procedures. For example, finding duplicate records
in a database made us pay more attention to how DB/TextWorks can inspect imported
data and check for duplicates according to our specifications.A few minor losses
of electronic data taught us to adhere to a regular backup plan, too. And no
one has to warn us anymore about the dangers involved in performing a batch
modification or deletion in our database program before we have a backup. We
have learned a lot the hard way.
Barbarians at the Gate
Shortly after we set up our equipment in March 2000 came the day we spotted
barbarians at the gate: Termites were contentedly eating a volume on a storage
shelf next to our overhead scannerin a room where we store our most valuable
artifacts. And as I was finishing this article, there was a mysterious fire
in a trash dumpster a few feet from a small window leading into the very same
room. The termites have been turned back, and the late-night fire scorched
just the outside of the building before it was fortunately discovered and extinguished,
but these incidents emphasize how easily something or someone can destroy our
collection. At moments like these we know that our task, in spite of its difficulty
and length, is worth the effort. Our director has already guaranteed that the
project will continue indefinitely beyond the original grant period. The library
will handle expenses for the maintenance of hardware, software, and even the
salary of a part-time staff person.
We have digitized only a small portion of our collection up to this point,
but at least we know that we are preserving the Athens of Indiana while rendering
its heritage more accessible than ever before. Few days go by when we don't
hear from researchers around the country who are thankful for what we have
accomplished and who look forward to more.
To Contact the Companies
Inmagic, Inc.
DB/SearchWorks
DB/TextWorks
DB/Text WebPublisher
200 Unicorn Park Dr.
Woburn, MA 01801
http://www.inmagic.com
Minolta Corp. USA
PS 7000
101 Williams Dr.
Ramsey, NJ 07446
http://www.minolta.com
eiStream (Eastman Software)
Imaging for Windows
2911 Turtle Creek Blvd., #1100
Dallas, TX 75219
http://www-11.eistream.com
Bill Helling is the systems librarian and Webmaster at the
Crawfordsville District Public Library in Crawfordsville, Ind. He holds an M.I.S.
from Indiana University. He also teaches library automation and information architecture
as an adjunct faculty member for Indiana University. His e-mail address is web@cdpl.lib.in.us.
|