Opening Doors with Open Source

Vol. 20, No. 9 • October 2000

• FEATURE •
Opening Doors with Open Source
by Don Gourley

Libraries large and small are starting to realize the opportunities that open source software can make possible.

In Beauregard Parish, Louisiana, patrons access library catalogs and Web resources using Linux-based public access workstations at their local library. Developers at the Cushing/Whitney Medical Library at Yale University are leading a cooperative effort to build a free reference source to describe online journals and aggregator content. Here at the Washington Research Library Consortium (WRLC), located near Washington, DC, we are basing our new customizable and personalizable portal for Web resources on free software from the Apache organization.

Libraries large and small are starting to realize the opportunities that open source software can make possible. “Open source” is a buzz-phrase that is currently receiving a lot of attention in the software industry and, increasingly, in libraries. But what does it mean, and why is it important for libraries? In this article, I’ll try to answer those questions by drawing on our experiences using open source software at the WRLC to develop a variety of Web-based applications and services.

Peeking Through the Door
Even though it’s usually licensed for free, open source software is not the same as free software. There are free programs that aren’t open source, and there are costs associated with any software besides the initial licensing fee. Open source means that you are free to use, read, modify, and redistribute the source code and program as you wish. Librarians will recognize this distinction immediately. We strive to enable the free flow of information, but recognize the many costs associated with running a library.

Others have taken the library/open source analogy pretty far. See the Web sites I’ve listed in the accompanying sidebar for discussions of the open source movement and how it relates to libraries. I’ve also included pointers to some of the open source software mentioned in this article and to essays on the technical and social aspects of open source. They include convincing arguments that open source software can be more quickly and reliably developed and enhanced than software distributed in binary format only.

At the WRLC, however, it’s the practical benefits rather than the philosophy that has attracted us to open source software. The WRLC is a regional resource-sharing organization established by seven universities in the Washington, DC, metropolitan area to expand and enhance the information resources available to their students and faculty. As is true for libraries around the world, many of those resources are electronic and require software systems for access and management.

When we evaluate software to meet those needs, we are interested in whether it does the job and whether we can afford both the software license and the time and resources to set it up and maintain it. We will get the most affordable working solution, whether it is open source or not. Obviously, open source software has an immediate affordability advantage in our environment. It is usually free, but even if we pay for a package, the open source license allows us to deploy that software on as many computers and for as many users as we want.

How We Opened New Integration Opportunities
Finding a solution that meets our needs is a more difficult problem, sometimes at any price. The requirements of a library are pretty specific, and in our case we need to meet the needs of seven libraries. No longer will a single monolithic system meet all the requirements; we need to integrate various pieces together, and that calls for some degree of openness in each of the pieces. For example, our library catalog system is not open source, but the underlying database structure is open to us, allowing integration with other applications that can query the catalog database directly.

Open source gives us the opportunity for an even greater degree of integration. Even without modifying the source code (which we rarely do), by examining the code we can more easily see how the pieces of a program fit together and how we can plug our own “glueware” into it to connect multiple systems.

This is what we did with Prospero, an open source system for online document delivery. We chose to use Prospero because we were already using Ariel, which Prospero is designed to complement. Ariel, from the Research Libraries Group (http://www.rlg.org), is a commercial document delivery system for interlibrary loan that uses the Internet to transmit electronic documents to borrowing libraries where they can be printed out for patrons. Prospero lets us streamline the process for consortium member libraries by converting Ariel documents to Adobe PDF and posting them to a secure Web site.

But Prospero comes with its own user file and authentication mechanisms. We already had a patron information system where we combined data from library catalogs and our consortial loan system. We just wanted to add patrons’ online documents from Prospero to our system. By examining the source code for Prospero, we discovered how to disable user file updates from the Prospero client and create our own user file from the patron information system. When patrons log in to our patron information system, it can automatically log them in to Prospero so they can view and retrieve their documents.

This integration was made simple because we could see exactly how Prospero worked by looking at its source code. But it also required custom programming for our patron information system in order to create and maintain the Prospero user file and give patrons access to Prospero. Developing this kind of glueware has allowed the WRLC to migrate to an environment in which all our services are delivered over the Web in pages that integrate data from a variety of sources. And open source software has provided us with the tools to implement this migration.

This Way Looks Familiar
The WRLC’s central public access information service is known as ALADIN (Access to Library and Database Information Network). It includes an online library catalog, article databases, image and other multimedia collections, and other resources. Since the mid-90s this service has been available on the Web. Originally, this meant that ALADIN provided Web interfaces to our catalog system and to locally mounted article and image databases. But over time the locally mounted information resources have been replaced by remote database services, and the catalog system has required various add-on services and programs.

The WRLC needed a way to manage the integration and development of this growing web of resources and services. Without consciously adopting the philosophy, we turned more and more often to open source tools. This was natural since we were building on top of the Web, and the Web is built on top of the Internet, and the Internet is built on, of course, open source software. Much of the basic software running the Internet is open source, from BIND (the basic system that maps host names to IP addresses) to sendmail (the mail transfer agent for most e-mail on the Internet). The first graphical Web browser, Mosaic, was an open source program, as is the Apache Web server, which serves over 60 percent of the Web sites on the Internet. The open source door, it turned out, was already open.

The First Few Steps
Since we used Apache Web servers, it was natural to start there to look for tools that could extend our Web services. We were attracted to Apache JServ for several reasons: 1) It’s based on the same object-oriented programming language (Java) we were using to customize our OCLC SiteSearch system, 2) it’s highly integrated with the Apache Web server, and 3) it implemented a robust framework for Web applications including user session management and server load balancing.

The first JServ application we wrote was a simple system for uploading item bar codes from a portable scanner and producing shelf inventory reports by comparing the bar codes with the data in our library catalog. This application proved that we could use JServ to integrate data from our catalog system and external sources, and quickly put it on the Web. We were sufficiently excited to continue using JServ to develop increasingly complex and critical applications, including the patron information system referred to above, named (for lack of a more original title) myALADIN.

Meanwhile, it was becoming clear that we needed to adapt our core ALADIN authentication and menuing system to the changing environment in which we found ourselves. SiteSearch serves us well as a platform for delivering locally mounted databases. But now our system had become more of a portal to a variety of local and remote resources. As we attempted to integrate more disparate information resources, we found that our custom code was increasingly difficult to maintain when we added resources or upgraded SiteSearch. Adding a database to the Web menus—a frequent occurrence—required manual, error-prone changes to four or five initialization files and an mSQL database. Adding or changing subject menus was even worse.

SiteSearch is a highly extensible system, but there were still things that we couldn’t get it to do. For example, we wanted to avoid forcing patrons to log in multiple times when they accessed multiple services. Once they logged in to ALADIN, they shouldn’t have to log in to the proxy servers used to access some databases. But there was no way we could find to verify that the SiteSearch session ID we had saved in a cookie was valid. Perhaps if SiteSearch was open source we could have figured out a way to do this, but without that advantage, this past spring we decided to pull our patron authentication and database menu code out of SiteSearch and build our own system using JServ.

An Open Source Application Architecture
Despite our initial successes with JServ-based glueware applications, we were still learning how to use Java and JServ, and we weren’t sure our code was the most efficient or maintainable. With ALADIN we needed to make sure access was quick, since the site had to handle over 100 requests a minute during peak times. And we needed to be sure that customization changes for our libraries, such as adding databases or changing Web page layout, could be done quickly and easily without source code modifications or restarting the system.

One problem we had was separating the application logic from the HTML code. Our early systems had more lines of code emitting HTML tags than doing anything else. Any changes to the HTML formatting required modifying a Java class, recompiling it, and reinitializing the JServ zone to get the new class loaded. Again, open source software came to the rescue, as we found several free systems that encapsulate HTML in templates. Because the software was free, we downloaded three alternatives and tried them out to see which one worked best in our environment. We chose FreeMarker because it is a simple Java-based system that integrated well with JServ.

FreeMarker then became part of our overall application architecture (see Figure 1), based on open source software tools. This architecture allowed us to quickly rewrite our portal system into a faster and more flexible system. Not only are the applications flexible, but open source software allows us to modify and enhance the architecture without making a big investment up front. A real advantage of open source software is the ability to fully evaluate software without making any financial commitment or dealing with sales people.

Supporting the Software
Of course, we have encountered problems using our open source software toolset. But the problems aren’t unlike those that come with any software. Probably the most serious problem has been the steep learning curve. Documentation quality is variable (as it is with commercial packages), and often you are left to figure out how things work by experimenting. With open source software, if you are familiar with the underlying technology you can always take a peek under the hood to try to figure out what’s going on. At WRLC, we aren’t going to get much by looking at the code for the Linux kernel, but we were able to learn a lot, for example, by looking at the Perl scripts that manage the Prospero Web document delivery. The more popular the software, the more help you can find to learn about it. So, you will find many books, Web sites, and other resources about Linux.

The more popular software packages also have more support options, including commercial support that can be purchased. Even the smaller open source software packages, however, have at least an e-mail list where you can get support from other users and, often, the developers themselves. JServ, for example, has a very active list to which the primary developer contributes frequently. Sometimes he posts a dozen messages a day responding to questions or problem reports. He doesn’t suffer fools gladly, and gets particularly annoyed by questions that are answered in the FAQ or other documentation available for JServ. But I am happy to tolerate a little grumpiness in exchange for access to the software developer.

Keeping the Doors Open
Those support options are helpful, and may be sufficient for libraries using open source applications like Prospero that work well and require little configuration. But doing the things we do at the WRLC with open source software, such as developing custom applications and integrating commercial, open source, and our own systems, requires in-house support. Without a certain level of expertise, it is impossible to even ask the kinds of questions that might result in helpful responses on an e-mail list. We were able to make use of open source software because we had a small programming staff that had experience using UNIX and Web tools in general. Although open source tools allow the staff to develop and integrate applications more quickly, their time and expertise are not free, which is why there really is no “free” software.

However, if you have access to some in-house IT expertise, then open source software can open many doors of opportunity for your library. At the WRLC, we found that using open source software was not unlike using any other software; it’s just more flexible. That flexibility has given us the opportunity to adapt our systems to meet the changing requirements of the wired information world, and that’s a door we need to keep open.

Open Source Tools and Philosophy

Open Source Systems for Libraries
http://www.oss4lib.org
This site is specifically geared to open source software for use in libraries. The Projects page lists open source projects for library software. The Readings page includes pointers to essays about open source and libraries, as well as books on more general open source topics.

OpenSource.Org
http://www.opensource.org
An excellent place to learn about the history, attitude, and philosophy of the open source phenomenon.

Linux World
http://www.linuxworld.com
Linux is perhaps the most visible open source project today, and there are consequently hundreds of Web sites focused on it. This Web-only magazine is a good place to start, with news and technical information about Linux along with many links to other sites.

The Free Software Foundation
http://www.fsf.org
One of the longest-running open source projects, the Free Software Foundation launched the GNU project in 1984. Much of what you think of as Linux is really GNU software running on the Linux kernel.

The Apache Software Foundation
http://www.apache.org
It started as an HTTP daemon, but has grown to include a variety of open source software projects, including XML-Apache, mod_perl, and Java-Apache, which includes our favorite Java application server, JServ.

FreeMarker
http://freemarker.sourceforge.net
We use this open source HTML template engine to get data from Java servlets into Web pages while keeping HTML design separate from the application logic.

MySQL
http://www.mysql.com
Another key component of the WRLC application architecture is this open source relational database management system.

Prospero
http://bones.med.ohio-state.edu/prospero
This tool for sending ILL documents to the Web is open source software that libraries using Ariel can make use of quickly and easily.

Don Gourley is director of information technology and chief open source advocate at the Washington Research Library Consortium in Upper Marlboro, Maryland. He holds an M.S. in computer science from the University of Colorado–Boulder. His e-mail address is gourley@wrlc.org.

• Table of Contents

• Computers In Libraries Home Page