Open Source Digital Image Management Took Us from Raging Rivers to Quiet Waters

Online

KMWorld

CRM Media

Streaming Media

Faulkner

Speech Technology

Unisphere/DBTA

Other ITI Websites

American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Enterprise AI World Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Unisphere Research

Magazines > Computers in Libraries > April 2005
Back Index Forward

SUBSCRIBE NOW!

Vol. 25 No. 4 — April 2005

FEATURE
Open Source Digital Image Management Took Us from Raging Rivers to Quiet Waters
BY Isaac Hunter Dunlap

It all started with a phone call. In September 2003, Kathy Nichols contacted me seeking suggestions, recommendations--really anything I could think of--that might help her seize control of a bewildering and rapidly surging torrent of digital image files. As the technical assistant tasked with organizing the Archives and Special Collections' photo collections at Western Illinois University (WIU) Libraries, Kathy had already contacted archives around the state and found that others were also struggling to stay afloat amid a stream of digital management issues.

The challenges that Kathy described interested me, and I felt they might be addressed by the creative application of open source technology. Having recently developed several database-driven Web applications in my new role as coordinator of information systems for the libraries, I suggested that, by using local resources and expertise, it might be possible to develop a digital image management system that could dramatically improve the situation. Anyone who has ever interacted with library special collections, however, knows that by their very nature these wondrous treasuries of realia are altogether unique. So I was aware of the need to tread carefully in prescribing any specific technological solution.

So Much Culture Between the Rivers

Holding items as diverse as early Mormon correspondences, French Utopian Icarian ribbons, and singer Burl Ives' garments, the Archives and Special Collections Unit of the WIU Libraries collects and preserves materials related to the history and culture of the university and the 16 counties of West Central Illinois. This predominantly rural region lies between the city of Peoria and the Illinois River to the east and the Mississippi River to the west, about 2.5 hours north of metropolitan St. Louis.

The archival unit holds thousands of photographs that document virtually every aspect of the university and the region. Some of the images in the archives' possession lie on glass plate negatives from a bygone era. Most of the photographs are organized in folders arranged by geographical location and by subject. Existing negatives are filed and linked to corresponding images by a numerical system. Drawing from the university's 12,000 students and active local communities surrounding the Macomb campus, the archives serve as a primary resource for visitors conducting regional research and seeking reproductions of historical images.

A Confluence of Challenges

Beginning in the summer of 2002, the archives faced a new challenge when WIU's Visual Productions Center (VPC) fully migrated from traditional wet processing of 35mm film to exclusive digital photography. The archives had long relied on VPC to provide photographic reproduction services. No longer would the archives receive a fresh packet of negatives when a new collection was photographically reproduced or duplicate prints were requested. VPC would now provide the requested prints and forward a digital file as either a TIFF or JPEG.

It became immediately obvious that the unit needed some way of managing and organizing a torrent of new digital resources. The staff purchased a proprietary digital image management product hoping that it could cope with the flood. The PC-based software initially performed adequately, but problems soon emerged. The program became notorious for crashing and losing data. The established fields for cataloging the images did not lend themselves to customization, so descriptive terms were often entered under inappropriate category headings. Since the program sat on a single computer workstation, only one staff member could catalog or search for images at a time. The computer's location in a staff work area also precluded the possibility of enabling library patrons to perform their own searches.

Security and preservation issues also quickly emerged. Running without an integrated backup system, the image files were simply stored on the computer's hard drive. Files were occasionally burned to CDs, but there was a growing concern about the sustainability of this digital collection without a more robust storage mechanism. Given these many problems, perhaps it is not surprising that the software also did not provide an export feature! Once the data was cataloged in the program, there was no way to migrate descriptive records to a new system. Perhaps by a sheer stroke of luck, the program reached its maximum cataloging capacity in only 1 year. By summer 2003, the existing software simply would not catalog any more images.

We Were Paddling Upstream

It was at this low point that the unit was compelled to look for more attractive digital management alternatives. I met with dean of libraries James Huesmann, unit coordinator Frank Goudy, archives specialist Marla Vizdal, Kathy, and unit staff. We discussed alternatives and weighed the pros and cons of a rather narrow list of options. Amid a statewide budget crunch, purchasing an enterprise-level Digital Object Management (DOM) system was out of the question. Besides, our statewide academic library consortium was developing a proposal to purchase an enterprise-level DOM system for all 65 member institutions. The complex bidding and negotiation process wasn't even close to beginning, however, and the actual implementation timeline was sketchy but appeared to be at least 2 years out. Even if we could afford a new DOM system, it made little sense for us to pursue it apart from our consortium.

The facts remained that our digital software product had failed, our existing digital photo collection was at risk, new digital files continued to pour in from VPC, and we had minimal descriptive access to (or control of) either our print or digital photo collections.

Some of our choices were not attractive: We could always do nothing, which was an undesirable consequence that would likely send the archives staff drifting toward despair. Treading water would be difficult as we waited for the consortium's DOM to be selected, negotiated, and fully implemented statewide. We could perhaps write a grant or seek outside funding for a high-quality DOM system, but the forthcoming consortial solution argued strongly against it. Or we could try to pursue an upgraded version of our current software or a different off-the-shelf product. Regardless of the cost, we had found few well-regarded alternatives, and there was little enthusiasm for continuing to pursue single-user commercial products with ambiguous prospects for success.

A New Course: Planning for Digital Management

Ultimately we selected a course of action that in some respects was the most ambitious, yet it also was a realistic and responsible means of gaining control of an overwhelming, disorganized collection. We decided to simultaneously take steps to address the unit's short- to middle-term challenges while repositioning the digital collection to take advantage of our consortium's DOM system when it materialized. The archival staff wanted me to build a digital management system.

I proposed developing a Web-based system built primarily with two open source tools: PHP (http://www.php.net) and MySQL (http://www.mysql.com). Having used these resources to develop a heavily searched database of the library's print and electronic periodical holdings, I knew these technologies were dependable and up to the task.

PHP offers a natural language Web scripting syntax that is simple to learn. Easily embedded with HTML, PHP is only slightly more difficult to hand-code. The language's almost seamless connectivity with the powerful and lightning-fast MySQL database management system (used by the likes of NASA and Google) makes the combination ideal for developing reliable and scalable Web applications that can store, access, and present information. Seeking interoperability and compliance with recognized metadata guidelines for encoding textual data, I proposed using Dublin Core Metadata standards for the collection's descriptive elements.

The new system would include a password-protected cataloging module for staff use with a separate public interface supporting keyword Boolean searching. As we discussed the proposal, five major areas emerged that would define the parameters of the new system:

1. Cost: The restricted budget demanded that development costs be minimal (i.e., that we use free, open source software and existing hardware).

2. Storage: The system must have a secure means of automatically storing and preserving large amounts of data.

3. Portability: The system must have "open export" capabilities and be standards-based so that the data can easily migrate to new systems.

4. Access: The system's cataloging and search modules should be Web-based to facilitate use by all archives staff and the public.

5. Scalability: The system must be able to adapt and grow with changing and expanding content, work flows, and user needs.

It was relatively easy to resolve the storage and cost issues. A central computing entity, University Computer Support Services (UCSS), maintains a series of servers and storage systems for academic units throughout our campus (including the libraries). UCSS quickly created a new user account with MySQL and PHP functionality. Having UCSS host the database was not only cost-effective, but also provided sufficient storage capacity, and it meant that both the image files and descriptive data would automatically be copied to tape on a daily basis via an integrated backup storage system.

Finding a way to migrate the data from the old system to the new was more difficult. Since the proprietary software on Kathy's desktop lacked an export feature, we would have to manually copy and paste the individual fields from a few hundred cataloged records into the new cataloging module's text input fields. Though necessary in this situation, performing this tedious work even once was one too many times. It was absolutely critical for us to establish the correct type and number of fields. We only wanted to do this once. Following the Dublin Core Metadata standard also gave us increased confidence that our data would be well-ordered and ready to successfully migrate to the consortium's future DOM system.

Poring over the Details

Now was the time for careful planning. I wanted to avoid future headaches by fashioning a well-conceived data framework. Within a day or so, Kathy provided a list of required fields along with the maximum number of characters per field that she envisioned needing. After discussing the characteristics of the data we added a couple more fields. To ensure that we were considering all conceivable scenarios, I repeatedly challenged our assumptions.

I made a separate recommendation that the Archives Unit establish a new procedure for naming digital image files. For the new system to function correctly, we needed every file to have a unique number that would be our key to associating the images with their corresponding descriptive records. The original image filenames often began with somewhat descriptive, but not necessarily unique, phrases such as "jazzmusic_Al_Sears" or "macomb_bank2." Without an authoritative classification system, the likelihood of having duplicate filenames was a real possibility. Kathy instituted a numeric naming convention that permanently resolved this issue and gave us a reliable linking strategy.

Addressing issues such as these at the beginning of the design process was time-consuming and required some effort. But it was essential to focus our energies on making wise decisions up-front rather than later expending far greater resources mitigating the effects of a poorly devised opening strategy.

Turning the Boat Around: OS Scripting and Testing

With our planning completed and our framework in place, I turned to the task of constructing the system. The process of creating and normalizing the MySQL database tables went smoothly. I frequently use a full-featured open source administrative tool called phpMyAdmin (http://www.phpmyadmin.net) when setting up new MySQL databases or tables. Written in PHP, this Web tool simplifies MySQL table creation and modification. Instead of trying to perfectly enter verbose database structures at MySQL's native command line, phpMyAdmin lets me select field types from drop-down menus and click buttons to add or index columns.

With the database structure established, I began developing the cataloging module using PHP and HTML. For the cataloging interface, Kathy needed the ability to add, view, edit, delete, and search for records. I later added a "suppress" feature so that records not yet ready for release could be temporarily hidden. For the public search interface, I wanted to implement Boolean keyword searching and provide a way to navigate results displayed across multiple pages.

From previous experience I had learned to avoid creating massive PHP scripts filled with stanza upon stanza of functions and directives held together by a nearly infinite number of "if-else if-else" loops. Trying to debug a PHP script that spans several hundred lines is a nightmare. A far better approach is to create independent modules--small segments of code called upon to efficiently perform specific tasks, such as editing or deleting a database row. Since the modules stand alone, they can be easily reused and enhanced for subsequent projects. Recycling existing code saves time. Perhaps more importantly, modules (and PHP classes) proven to work "in the real world" can effectively stand at the core of successive applications. I generally use the PHP "include()" statement in my primary scripts to call function libraries or modules. By using a modular approach for this project, I was able to dramatically reduce the size of the primary script and avoid unnecessary processes, resulting in a much speedier application.

Over the next few weeks, the cataloging and search interfaces quickly began to take shape. As is necessary when nearing the conclusion of a project such as this, I tested and retested the system. With careful review, I almost always identify a bug or missing variable. After correcting every glitch I could find, I put the system through its final paces in November 2003 before turning it over to Kathy for the migration to begin.

Quieting the Waters

The laborious project of manually transferring the data to the appropriate fields of the new interface soon commenced. On Nov. 21, 2003, Kathy cataloged the new system's first digital image: a yellowed picture postcard from 1922 appropriately titled "Quiet Waters." The sleepy Spoon River, celebrated by famed poet Edgar Lee Masters, is shown gently flowing through the nearby verdant community of Babylon, Ill. Only 10 weeks had passed since Kathy's initial phone call, and only 7 since we'd made the decision to build the system. Before semester's end, Kathy had moved all of the data from the old software product into the new system.

Our images continue to be entered at a remarkable pace. The system now holds more than 2,400 cataloged digital images and can simultaneously be accessed by the entire archives staff. We plan to officially launch this Web resource for public use in 2005. The database is already believed to be one of the largest digital resources of its kind among public university libraries in Illinois.

We now have a system to effectively manage our photo collections and to position them for the future. We used to be overwhelmed by the flood of digital images, but now OS technology has provided us the means to channel them into a steady, manageable flow.

Seven PHP Coding Tips

1. Use a quality freeware text editor (e.g., HTML-Kit) with numbered lines and automatic color differentiation between interspersed PHP and HTML code.

2. For efficiency and security, don't explicitly code database server passwords into your scripts. Use PHP variables to import the password info (or better, call the entire connection sequence into your script) from a secure file outside your public Web directory.

3. Improve debugging and script runtime by dividing tasks (e.g., editing, deleting rows) into distinct modules. Use the PHP "include()" statement in your primary script to call modules as needed.

4. Add succinct, descriptive comments while coding. This will help speed the updating of modules you've not recently touched.

5. Establish a style for naming database tables and columns and stick to it! Here's
a tip for coding/debugging: Table names can be reused within columns to make associations clear (e.g., table = "frog"; the jokes column in "frog" = "frog_jokes").

6. Store digital images on a regular server instead of trying to insert them into your MySQL database. Store filenames in your database; create links to these files via PHP.

7. Even if your library does not maintain or have access to a Web server, many commercial ISPs offer affordable hosting services with PHP/MySQL support.

Isaac Hunter Dunlap is an associate professor and coordinator of information systems at Western Illinois University Libraries in Macomb. He holds an M.L.I.S. from the University of North CarolinaGreensboro. His e-mail address is ih-dunlap@wiu.edu.

Back to top