FEATURE
Open Source Digital Image Management Took Us from
Raging Rivers to Quiet Waters
BY Isaac Hunter Dunlap
It all started with a phone call. In September 2003, Kathy Nichols contacted
me seeking suggestions, recommendations--really anything I could think of--that
might help her seize control of a bewildering and rapidly surging torrent of
digital image files. As the technical assistant tasked with organizing the
Archives and Special Collections' photo collections at Western Illinois University
(WIU) Libraries, Kathy had already contacted archives around the state and
found that others were also struggling to stay afloat amid a stream of digital
management issues.
The challenges that Kathy described interested me, and I felt they might
be addressed by the creative application of open source technology. Having
recently developed several database-driven Web applications in my new role
as coordinator of information systems for the libraries, I suggested that,
by using local resources and expertise, it might be possible to develop a digital
image management system that could dramatically improve the situation. Anyone
who has ever interacted with library special collections, however, knows that
by their very nature these wondrous treasuries of realia are altogether unique.
So I was aware of the need to tread carefully in prescribing any specific technological
solution.
So Much Culture Between the Rivers
Holding items as diverse as early Mormon correspondences, French Utopian
Icarian ribbons, and singer Burl Ives' garments, the Archives and Special Collections
Unit of the WIU Libraries collects and preserves materials related to the history
and culture of the university and the 16 counties of West Central Illinois.
This predominantly rural region lies between the city of Peoria and the Illinois
River to the east and the Mississippi River to the west, about 2.5 hours north
of metropolitan St. Louis.
The archival unit holds thousands of photographs that document virtually
every aspect of the university and the region. Some of the images in the archives'
possession lie on glass plate negatives from a bygone era. Most of the photographs
are organized in folders arranged by geographical location and by subject.
Existing negatives are filed and linked to corresponding images by a numerical
system. Drawing from the university's 12,000 students and active local communities
surrounding the Macomb campus, the archives serve as a primary resource for
visitors conducting regional research and seeking reproductions of historical
images.
A Confluence of Challenges
Beginning in the summer of 2002, the archives faced a new challenge when
WIU's Visual Productions Center (VPC) fully migrated from traditional wet processing
of 35mm film to exclusive digital photography. The archives had long relied
on VPC to provide photographic reproduction services. No longer would the archives
receive a fresh packet of negatives when a new collection was photographically
reproduced or duplicate prints were requested. VPC would now provide the requested
prints and forward a digital file as either a TIFF or JPEG.
It became immediately obvious that the unit needed some way of managing and
organizing a torrent of new digital resources. The staff purchased a proprietary
digital image management product hoping that it could cope with the flood.
The PC-based software initially performed adequately, but problems soon emerged.
The program became notorious for crashing and losing data. The established
fields for cataloging the images did not lend themselves to customization,
so descriptive terms were often entered under inappropriate category headings.
Since the program sat on a single computer workstation, only one staff member
could catalog or search for images at a time. The computer's location in a
staff work area also precluded the possibility of enabling library patrons
to perform their own searches.
Security and preservation issues also quickly emerged. Running without an
integrated backup system, the image files were simply stored on the computer's
hard drive. Files were occasionally burned to CDs, but there was a growing
concern about the sustainability of this digital collection without a more
robust storage mechanism. Given these many problems, perhaps it is not surprising
that the software also did not provide an export feature! Once the data was
cataloged in the program, there was no way to migrate descriptive records to
a new system. Perhaps by a sheer stroke of luck, the program reached its maximum
cataloging capacity in only 1 year. By summer 2003, the existing software simply
would not catalog any more images.
We Were Paddling Upstream
It was at this low point that the unit was compelled to look for more attractive
digital management alternatives. I met with dean of libraries James Huesmann,
unit coordinator Frank Goudy, archives specialist Marla Vizdal, Kathy, and
unit staff. We discussed alternatives and weighed the pros and cons of a rather
narrow list of options. Amid a statewide budget crunch, purchasing an enterprise-level
Digital Object Management (DOM) system was out of the question. Besides, our
statewide academic library consortium was developing a proposal to purchase
an enterprise-level DOM system for all 65 member institutions. The complex
bidding and negotiation process wasn't even close to beginning, however, and
the actual implementation timeline was sketchy but appeared to be at least
2 years out. Even if we could afford a new DOM system, it made little sense
for us to pursue it apart from our consortium.
The facts remained that our digital software product had failed, our existing
digital photo collection was at risk, new digital files continued to pour in
from VPC, and we had minimal descriptive access to (or control of) either our
print or digital photo collections.
Some of our choices were not attractive: We could always do nothing, which
was an undesirable consequence that would likely send the archives staff drifting
toward despair. Treading water would be difficult as we waited for the consortium's
DOM to be selected, negotiated, and fully implemented statewide. We could perhaps
write a grant or seek outside funding for a high-quality DOM system, but the
forthcoming consortial solution argued strongly against it. Or we could try
to pursue an upgraded version of our current software or a different off-the-shelf
product. Regardless of the cost, we had found few well-regarded alternatives,
and there was little enthusiasm for continuing to pursue single-user commercial
products with ambiguous prospects for success.
A New Course: Planning for Digital Management
Ultimately we selected a course of action that in some respects was the most
ambitious, yet it also was a realistic and responsible means of gaining control
of an overwhelming, disorganized collection. We decided to simultaneously take
steps to address the unit's short- to middle-term challenges while repositioning
the digital collection to take advantage of our consortium's DOM system when
it materialized. The archival staff wanted me to build a digital management
system.
I proposed developing a Web-based system built primarily with two open source
tools: PHP (http://www.php.net) and MySQL (http://www.mysql.com). Having used
these resources to develop a heavily searched database of the library's print
and electronic periodical holdings, I knew these technologies were dependable
and up to the task.
PHP offers a natural language Web scripting syntax that is simple to learn.
Easily embedded with HTML, PHP is only slightly more difficult to hand-code.
The language's almost seamless connectivity with the powerful and lightning-fast
MySQL database management system (used by the likes of NASA and Google) makes
the combination ideal for developing reliable and scalable Web applications
that can store, access, and present information. Seeking interoperability and
compliance with recognized metadata guidelines for encoding textual data, I
proposed using Dublin Core Metadata standards for the collection's descriptive
elements.
The new system would include a password-protected cataloging module for staff
use with a separate public interface supporting keyword Boolean searching.
As we discussed the proposal, five major areas emerged that would define the
parameters of the new system:
1. Cost: The restricted budget demanded that development costs be
minimal (i.e., that we use free, open source software and existing hardware).
2. Storage: The system must have a secure means of automatically storing
and preserving large amounts of data.
3. Portability: The system must have "open export" capabilities and
be standards-based so that the data can easily migrate to new systems.
4. Access: The system's cataloging and search modules should be Web-based
to facilitate use by all archives staff and the public.
5. Scalability: The system must be able to adapt and grow with changing
and expanding content, work flows, and user needs.
It was relatively easy to resolve the storage and cost issues. A central
computing entity, University Computer Support Services (UCSS), maintains a
series of servers and storage systems for academic units throughout our campus
(including the libraries). UCSS quickly created a new user account with MySQL
and PHP functionality. Having UCSS host the database was not only cost-effective,
but also provided sufficient storage capacity, and it meant that both the image
files and descriptive data would automatically be copied to tape on a daily
basis via an integrated backup storage system.
Finding a way to migrate the data from the old system to the new was more
difficult. Since the proprietary software on Kathy's desktop lacked an export
feature, we would have to manually copy and paste the individual fields from
a few hundred cataloged records into the new cataloging module's text input
fields. Though necessary in this situation, performing this tedious work even
once was one too many times. It was absolutely critical for us to establish
the correct type and number of fields. We only wanted to do this once. Following
the Dublin Core Metadata standard also gave us increased confidence that our
data would be well-ordered and ready to successfully migrate to the consortium's
future DOM system.
Poring over the Details
Now was the time for careful planning. I wanted to avoid future headaches
by fashioning a well-conceived data framework. Within a day or so, Kathy provided
a list of required fields along with the maximum number of characters per field
that she envisioned needing. After discussing the characteristics of the data
we added a couple more fields. To ensure that we were considering all conceivable
scenarios, I repeatedly challenged our assumptions.
I made a separate recommendation that the Archives Unit establish a new procedure
for naming digital image files. For the new system to function correctly, we
needed every file to have a unique number that would be our key to associating
the images with their corresponding descriptive records. The original image
filenames often began with somewhat descriptive, but not necessarily unique,
phrases such as "jazzmusic_Al_Sears" or "macomb_bank2." Without an authoritative
classification system, the likelihood of having duplicate filenames was a real
possibility. Kathy instituted a numeric naming convention that permanently
resolved this issue and gave us a reliable linking strategy.
Addressing issues such as these at the beginning of the design process was
time-consuming and required some effort. But it was essential to focus our
energies on making wise decisions up-front rather than later expending far
greater resources mitigating the effects of a poorly devised opening strategy.
Turning the Boat Around: OS Scripting and Testing
With our planning completed and our framework in place, I turned to the task
of constructing the system. The process of creating and normalizing the MySQL
database tables went smoothly. I frequently use a full-featured open source
administrative tool called phpMyAdmin (http://www.phpmyadmin.net) when setting
up new MySQL databases or tables. Written in PHP, this Web tool simplifies
MySQL table creation and modification. Instead of trying to perfectly enter
verbose database structures at MySQL's native command line, phpMyAdmin lets
me select field types from drop-down menus and click buttons to add or index
columns.
With the database structure established, I began developing the cataloging
module using PHP and HTML. For the cataloging interface, Kathy needed the ability
to add, view, edit, delete, and search for records. I later added a "suppress" feature
so that records not yet ready for release could be temporarily hidden. For
the public search interface, I wanted to implement Boolean keyword searching
and provide a way to navigate results displayed across multiple pages.
From previous experience I had learned to avoid creating massive PHP scripts
filled with stanza upon stanza of functions and directives held together by
a nearly infinite number of "if-else
if-else" loops. Trying to debug a PHP script that spans several hundred lines
is a nightmare. A far better approach is to create independent modules--small
segments of code called upon to efficiently perform specific tasks, such as
editing or deleting a database row. Since the modules stand alone, they can
be easily reused and enhanced for subsequent projects. Recycling existing code
saves time. Perhaps more importantly, modules (and PHP classes) proven to work "in
the real world" can effectively stand at the core of successive applications.
I generally use the PHP "include()" statement in my primary scripts to call
function libraries or modules. By using a modular approach for this project,
I was able to dramatically reduce the size of the primary script and avoid
unnecessary processes, resulting in a much speedier application.
Over the next few weeks, the cataloging and search interfaces quickly began
to take shape. As is necessary when nearing the conclusion of a project such
as this, I tested and retested the system. With careful review, I almost always
identify a bug or missing variable. After correcting every glitch I could find,
I put the system through its final paces in November 2003 before turning it
over to Kathy for the migration to begin.
Quieting the Waters
The laborious project of manually transferring the data to the appropriate
fields of the new interface soon commenced. On Nov. 21, 2003, Kathy cataloged
the new system's first digital image: a yellowed picture postcard from 1922
appropriately titled "Quiet Waters." The sleepy Spoon River, celebrated by
famed poet Edgar Lee Masters, is shown gently flowing through the nearby verdant
community of Babylon, Ill. Only 10 weeks had passed since Kathy's initial phone
call, and only 7 since we'd made the decision to build the system. Before semester's
end, Kathy had moved all of the data from the old software product into the
new system.
Our images continue to be entered at a remarkable pace. The system now holds
more than 2,400 cataloged digital images and can simultaneously be accessed
by the entire archives staff. We plan to officially launch this Web resource
for public use in 2005. The database is already believed to be one of the largest
digital resources of its kind among public university libraries in Illinois.
We now have a system to effectively manage our photo collections and to position
them for the future. We used to be overwhelmed by the flood of digital images,
but now OS technology has provided us the means to channel them into a steady,
manageable flow.
Further Reading
Castelli, Vittorio and Lawrence D. Bergman (2002). Image Databases: Search
and Retrieval of Digital Imagery (Wiley).
Gianna, Andrew (2002). "Database-Driven Web Sites: Taking Your Web Presence
to the Next Level" (http://www.techsoup.org/howto/articlepage.cfm?ArticleId=371).
Sklar, David and Adam Trachtenberg (2002). PHP Cookbook (O'Reilly).
Ullman, Larry (2004). PHP for the World Wide Web. 2nd ed. (Peachpit
Press).
Welling, Luke and Laura Thomson (2003). PHP and MySQL Web Development.
2nd ed. (Sams).
Yank, Kevin (2004). Build Your Own Database Driven Website Using PHP & MySQL.
3rd ed. (SitePoint).
Seven PHP Coding Tips
1. Use a quality freeware text editor (e.g., HTML-Kit) with numbered
lines and automatic color differentiation between interspersed PHP and HTML
code.
2. For efficiency and security, don't explicitly code database server
passwords into your scripts. Use PHP variables to import the password info
(or better, call the entire connection sequence into your script) from a secure
file outside your public Web directory.
3. Improve debugging and script runtime by dividing tasks (e.g., editing,
deleting rows) into distinct modules. Use the PHP "include()" statement in
your primary script to call modules as needed.
4. Add succinct, descriptive comments while coding. This will help speed
the updating of modules you've not recently touched.
5. Establish a style for naming database tables and columns and stick
to it! Here's
a tip for coding/debugging: Table names can be reused within columns to make
associations clear (e.g., table = "frog"; the jokes column in "frog" = "frog_jokes").
6. Store digital images on a regular server instead of trying to insert
them into your MySQL database. Store filenames in your database; create links
to these files via PHP.
7. Even if your library does not maintain or have access to a Web server,
many commercial ISPs offer affordable hosting services with PHP/MySQL support.
Isaac Hunter Dunlap is an associate professor and coordinator of information
systems at Western Illinois University Libraries in Macomb. He holds an M.L.I.S.
from the University of North CarolinaGreensboro. His e-mail address is
ih-dunlap@wiu.edu. |