Online KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



Magazines > Computers in Libraries > June 2003
Back Index Forward
 




SUBSCRIBE NOW!
Vol. 23 No. 6 — June 2003
Feature
Preserving the 'Athens of Indiana' Through Digitization
By Bill Helling

A few weeks after I came to the Crawfordsville District Public Library in Indiana as its first systems librarian, I was asked to spearhead a very ambitious digitization project. Two of our senior librarians, Mary Johnson and Judy Spencer, had initiated the project a year earlier, in 1998, after realizing that the local history collection was in danger: Some items were already damaged by frequent use, and some were lost due to either misplacement or theft. These librarians saw digitizing as an obvious way to preserve the originals. They were admittedly not able to handle any of the technical aspects of digitization, but they did their homework by visiting several regional libraries involved in such efforts. Though they did not find any public library of our size to use as a model, they requested a grant of $62,000 from our Montgomery County Community Foundation. The foundation approved the grant in early 1999, shortly after my arrival. All they needed at that point was to turn the grant proposal into a reality—and that was where I came in.

Checklist for Getting Started

• Take an inventory of your collection. This is a good time to fix cataloging problems, reunite items belonging together, weed, and prioritize.

• Gather any type of indexing already done, whether in book form or on cards.
It may be ready for simple data entry.

• Search for any type of electronic data that may exist from previous indexing projects to see if this data can be imported into your database application.

• Establish authority control and indexing conventions that can be used to clean up existing indexes and to guarantee the consistency of future ones.

• Train current indexers on proper data entry in an application that they can manage and that you can import into a database.

• Establish a list of who is indexing what in order to avoid conflicts and to promote a sense of ownership.

• Assign someone to be responsible for supervising the data-entry process and collecting the new data from its various points of production.

• Perform clean-up duties before importing data into your database by using the strengths of the different applications that the data may arrive in. For example, use Excel to sort data by different fields (columns) in order to find obvious entry errors.

• Back up all electronic data on a regular basis and distribute it to safe places.

Surveying the Past Glory

The library serves the small town of Crawfordsville, with its 14,000 residents, as well as the surrounding area. This town has some real historical significance—it was an important portal in the 19th century for settlers moving between the East and the West. It is still the home ofWabash College, founded in 1832, and it was the residence of several now-forgotten but once well-published writers such as Lew Wallace, of Ben Hur fame, and Maurice Thompson, who penned Alice of Old Vincennes. Henry S. Lane, a leading figure in the founding of the Republican Party and friend of Abraham Lincoln, lived two blocks from the present-day library. Such was the reputation of its inhabitants and its culture that Crawfordsville was referred to as the "Athens of Indiana"!

The library here is thus a repository for a large number of historical items, many dating back to the early days of the county. Our original building, the Carnegie Library, was dedicated in 1902, and today its 108,000 items fill not only the library structure but also a remodeled car dealership now connected to it. Our small Local History room can't hold much: We are using space on another floor, garage space left over from the car dealership, and part of the reference department for storage. Because of the popularity of the local history collection, our staff struggles to answer the constant mail, e-mail, and phone queries as well as to serve the researchers coming from all over the country.

Gathering the Right Tools

By the time the grant was awarded, we were at the point where we had to act to preserve our endangered heritage while making it easier to access. At the outset, we were simply planning for in-house access of our digitized images. We intended to scan for archival purposes any manuscript or item of historical interest and to burn these images on CDs so that we could store the items safely away. The archive image (a 600-dpi noncompressed TIFF image) would be suitable for printing. We were then prepared to save the image as a much smaller JPEG file (at 150 dpi) or Adobe PDF file for viewing on local computers. We were also planning to invest in a simple database program to help us keep track of the images.

We knew that we needed a special piece of equipment to scan many of the local history items. The pages of old volumes tend to crumble, and their inflexible spines split when opened too far. Many of the volumes are also several inches thick and are unwieldy because of their weight or size.

The grant request specified $18,000 for an overhead scanner (for face-up scanning), and I decided upon the Minolta PS 7000 for several reasons. First, this grayscale scanner covers a maximum area of 17 x 23 inches at 400 dpi (and
11 x 17 or less at 600 dpi), which was ideal for our oversized items. I was also satisfied by the way it works with Eastman Software's Imaging for Windows (Professional Edition) in straightening curved text and lightening the gutter area—an important consideration for scanning thick volumes. In addition, the surface of this overhead scanner is adjustable so that it can cradle the spine of a large work while allowing for automatic or manual focus.

As for our database software, I decided that DB/TextWorks by Inmagic would serve us well because we could easily design custom databases around our different data. At the time, Inmagic was also offering DB/SearchWorks, which would enable us to share network copies of databases built with DB/TextWorks. In addition, DB/SearchWorks would allow users to view images attached to the records in the database.

Hardware, software, and supplies accounted for some $25,000 of the grant. However, none of the technology purchases would matter without the staffing to take advantage of them. The grant writers fortunately had figured in money to hire a part-time staff person for 3 years to perform scanning duties.

Rethinking the Approach

Even the simplest plans change once a project gets going, however. In working with Inmagic, I soon became aware of its DB/Text WebPublisher product. WebPublisher runs on a Web server and writes to HTML the results of a DB/TextWorks search—thus making the results viewable in any Web browser.

The chance to publish data on the Web caused me to reconsider the original plan. If we were going to produce a screen-viewable image after the archive image, why limit users to in-library viewing? Unfortunately, making these images available on the Web required a massive amount of storage space that our ISP would not allow. We also needed to install WebPublisher on the Web server and have access to its configuration files. I soon realized that we would have to run our own Web server if we really wanted to maximize access to our collection. Although WebPublisher and its maintenance costs were not figured into the grant request, our library director, Larry Hathaway, worked hard to free up the money from an already-tight library budget. As for the Web server, we were able to use a grant from the Library Services and Technology Act (LSTA) to purchase a Dell PowerApp server with enough power and storage space to suit our needs for years to come. In the meantime, I decided to keep the main library Web pages hosted on the ISP's server because, as the only systems librarian, I cannot possibly provide constant service for our Web gateway. However, I was willing to manage our Local History Web server, and I can simply link to the databases on this server from our ISP-hosted library Web site, http://www.cdpl.lib.in.us/lh.

By early 2000, the pieces were in place and we were ready to begin our preservation effort in earnest.

Preparing the Dig Site

Before we even made any plans, we knew that we had to first scan our most valuable "museum" pieces. These items included local business accounts from 1837 to 1839, Church of God records of 1842 to 1853, the Montgomery County Registry of Negroes and Mulattoes of 1853,some miscellaneous records ofWilliam Bratton (a member of the Lewis and Clark expedition who settled near here in 1822), and so on. After we were satisfied that the most important items were digitally preserved and then safely stored, we felt able to move on to a more systematic approach.

Unfortunately, no single person knew the true scope of our holdings. The library was automated, but the local history collection was not entered into the system. The old card catalog could have been written in an ancient language for all the help it was. A volunteer of many years, Dellie Craig, had the best understanding of the collection. We met repeatedly until we had mapped what we possessed and where it was buried. At this point we realized that we had so much more than just historic volumes and manuscripts to preserve as images: We had inherited the labors of previous library staff and volunteers who had been indexing for decades our newspapers, local magazines, and other resources. As a result, the library had many shelves of typed indexes and drawers crammed full of handwritten catalog cards. We would have to enter all the hard-copy material into our DB/TextWorks databases, of course, but using these ready-made sources was a natural starting point before digging deeper.

After some deliberation, we decided to produce two main types of databases. One type simply points to records when an image of an item represented by the record is unnecessary or impossible. For example, our cemetery database, which volunteers are transcribing directly from nearly 100,000 handwritten cards, is a reading of every headstone erected in the county up until 1981. This database most definitely does not need to lead to an image of every headstone.

However, for some other databases, a link to an image is often possible—and even necessary. We have a set of marriage records from 1888 to 1941 indexed only by groom and bride names. Each record links to an image of the actual page so that researchers can also discover parents' names, places of marriage, occupations, and so on. The only alternative to using an image link would be for us to transcribe everything mentioned on each marriage record!

Collecting the Artifacts

Deciding which resources to digitize can often be difficult, given that we have so many items to choose from. But, aside from digitizing items that we feared losing the most, we base our decisions on what researchers need to use and tend to request.At any given moment we have library staff and volunteers entering hard-copy data, from either an existing print index or from the original item, into a program that they feel comfortable using, such as Microsoft Word, Excel, or even Access. It is just not possible—or desirable—to have everyone enter data directly into our database program.

Our part-time staff person subsidized by the grant does all our scanning for archival and display purposes, and we both enter data directly into DB/TextWorks. We work together to pool the images and data that we have, and we always keep several different databases going at one time. Although we have tried, it is almost impossible to predict how long it is going to take us to dig up and dust off our data because of so many factors, including how busy I am with my other duties, the time the volunteers can give us, and the amount of accompanying scanned images, if any.

Encountering Problems, Learning Valuable Lessons

Some intrepid indexers began to use word processors several years before my arrival. Unfortunately, they considered word processing as merely a convenient way to produce a nice print copy before discarding the electronic file! No need to keep those pesky files around after printing, right? I quickly ended this practice. But even the electronic data produced after my arrival—and now being saved—presented problems. For example, those using Word did not think about how to enter the data so that it could easily be moved into a database. Well-meaning volunteers were indiscriminate about using numerous tab stops, spaces, and hard returns to line up data into nice columns instead of using just one tab mark between columns or simply using a table. I am now closely supervising all electronic data entry in order to minimize my conversion problems. I also know exactly what type of data-entry mistakes to look for, and I spend more time cleaning data before I begin moving it into DB/TextWorks. Nevertheless, I am willing to have volunteers work in their comfort zones in any format that I can later clean up and import. We can't succeed without volunteers.

The hard copy data compiled over many years was also a source of many problems. Some indexers created authority files but rarely updated them before putting them aside. Nothing was carved in stone: spelling, capitalization, and anything else that could evolve into different forms did so. These inconsistencies were showing up all too well in digital form, so we began to place more effort into establishing and following indexing conventions.

Our actual scanning experiences have been relatively trouble-free. We soon learned how to adjust the room lighting near the sensitive overhead scanner and to keep reflective objects away from it to avoid interfering with the quality of the scan. We also know now that we can never place the scanner too close to a solid white wall because of the reflections. Our scanning assistant has even learned to dress in dark colors on days when she is scanning light text so that her clothes do not interfere with the image quality!

We have also adjusted our procedures. For example, finding duplicate records in a database made us pay more attention to how DB/TextWorks can inspect imported data and check for duplicates according to our specifications.A few minor losses of electronic data taught us to adhere to a regular backup plan, too. And no one has to warn us anymore about the dangers involved in performing a batch modification or deletion in our database program before we have a backup. We have learned a lot the hard way.

Barbarians at the Gate

Shortly after we set up our equipment in March 2000 came the day we spotted barbarians at the gate: Termites were contentedly eating a volume on a storage shelf next to our overhead scanner—in a room where we store our most valuable artifacts. And as I was finishing this article, there was a mysterious fire in a trash dumpster a few feet from a small window leading into the very same room. The termites have been turned back, and the late-night fire scorched just the outside of the building before it was fortunately discovered and extinguished, but these incidents emphasize how easily something or someone can destroy our collection. At moments like these we know that our task, in spite of its difficulty and length, is worth the effort. Our director has already guaranteed that the project will continue indefinitely beyond the original grant period. The library will handle expenses for the maintenance of hardware, software, and even the salary of a part-time staff person.

We have digitized only a small portion of our collection up to this point, but at least we know that we are preserving the Athens of Indiana while rendering its heritage more accessible than ever before. Few days go by when we don't hear from researchers around the country who are thankful for what we have accomplished and who look forward to more.

 

To Contact the Companies

Inmagic, Inc.
DB/SearchWorks
DB/TextWorks
DB/Text WebPublisher
200 Unicorn Park Dr.
Woburn, MA 01801
http://www.inmagic.com

Minolta Corp. USA
PS 7000
101 Williams Dr.
Ramsey, NJ 07446
http://www.minolta.com

eiStream (Eastman Software)
Imaging for Windows
2911 Turtle Creek Blvd., #1100
Dallas, TX 75219
http://www-11.eistream.com

 


Bill Helling is the systems librarian and Webmaster at the Crawfordsville District Public Library in Crawfordsville, Ind. He holds an M.I.S. from Indiana University. He also teaches library automation and information architecture as an adjunct faculty member for Indiana University. His e-mail address is web@cdpl.lib.in.us.
       Back to top