How Does XML Help Libraries?

Vol. 22, No. 8 • September 2002

Table of Contents

Subscribe Now!

Previous Issues

ITI Home

• FEATURE •
How Does XML Help Libraries?
by Kyle Banerjee

Many technological innovations relating to libraries have been made over the past decade, but few have generated as much excitement as XML. While many new technologies seem more promising before they are implemented than they are afterward (remember how Java was going to make platform-independent software available everywhere, and Z39.50 was going to let us find and obtain materials stored in libraries around the globe so much more quickly?), XML is already transforming how information is managed and delivered.

Why Are Librarians So Excited About XML?

XML is significant because it makes it much easier to share and search resources that are in different formats. Until fairly recently, this wasn't much of an issue for libraries. Historically, libraries have served as centralized repositories of information. They purchased books, journals, films, and other information resources on physical media, and patrons found what the library owned by consulting a catalog that listed holdings. Most catalogs are designed with the assumption that once a library records some descriptive information about each resource it has purchased, this information won't have to be radically altered. For physical resources, this works pretty well, since the authors, titles, subjects, and physical characteristics of a book don't change.

Once access to the Internet became widespread, it became clear that providing access to remote electronic resources can be very problematic. Catalogs are designed to provide access to physical resources that are under direct control of the library. However, people want to read journal articles, books, and useful Web pages stored in dynamically updated databases that are maintained and owned by other organizations that might be thousands of miles away. Online library catalogs are poorly suited for providing access to these works, so many libraries do not include these types of resources in the catalog. As a result, it is often very difficult for patrons to know what electronic resources they can get through their libraries.

This is where XML comes into the picture. It is impossible to search or display information unless it is structured in a meaningful way. In plain English, this means that information providers need to agree on standards for encoding electronic documents so they can be retrieved in a uniform way. Libraries have encoded bibliographic records in MARC for many years, and that has allowed them to easily share catalog records, which reduces costs while improving services. For a variety of reasons, it is not feasible to encode the new types of resources patrons want access to in MARC. However, when the information is stored in XML, it is possible to share and combine that data in ways that would not otherwise be possible.

Here Are Some Things You Need to Know

<?xml version="1.0" encoding="UTF-8"?>
<MyPhotoArchive>
<photo title="Walking on the Beach at Sunset" filename="http://home.earthlink.net/~banerjek/images/SunsetBeach.jpg">
   <dogs>
       <name>Keiko</name>
   </dogs>
   <people>
      <name>Banerjee, Kyle</name>
   </people>
   <place>
      <country-state>OR</country-state>
      <city>Manzanita</city>
   </place>
   <date>1999-08-13</date>
</photo>
<photo title="Charming snakes" filename="http://home.earthlink.net/~banerjek/images/india/Snakes.jpg">
   <people>
      <name>Banerjee, Kyle</name>
      <name>Lincicum, Shirley</name>
   </people>
   <place>
      <country-state>India</country-state>
      <city>New Delhi</city>
   </place>
   <date>1998-08-26</date>
</photo>
</MyPhotoArchive>

Figure 1: A perfectly legitimate XML document

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<head>
<title>My Photo Example</title>
</head>
<body>
<table width="100%">
   <xsl:for-each select="MyPhotoArchive/photo[dogs/name='Keiko']">
      <tr>
         <td><b><xsl:value-of select="@title" /></b><br />
            <xsl:value-of select="place/city" />,
            <xsl:value-ofselect="place/country-state" />
         </td>
         <td>
           <img>
              <xsl:attribute name="src">
             <xsl:value-of select="@filename" /></xsl:attribute>
           </img>
         </td>
      </tr>
   </xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

Figure 2: An XSL style sheet

When most people think of the things that can be done with XML, they are actually thinking about a family of related technologies rather than a single markup language. On a related note, it is better to think of XML as a grammar than as a language. XML establishes rules for defining new formats. In regular markup languages such as the HyperText Markup Language (HTML), authors must use certain tags that make the text bold, create clickable links, draw tables, apply style sheets, etc. With XML, authors are free to make up their own tags and attributes if they don't feel that those that have been created by others meet their needs. (See Figure 1.)

It's important to recognize that XML only provides a structure for storing information. It does not say anything about how information is displayed, it does not create links that people can click on, and it doesn't bring information resources
together by itself. If I want a Web-based photo archive that allows people to find pictures of my dog or trip to India, I still have to write or acquire a program with these capabilities. If someone else wants to develop an online photo archive that integrates my pictures with those stored in other repositories, she or he will also have to develop the software that will accomplish this task.

Putting XML into Action

To make XML do something useful, other technologies are necessary. The most significant of these is the Document Object Model (DOM). The DOM is a complex subject that is beyond the scope of a short article. In a nutshell, the DOM is an interface between programs and an XML document. The DOM doesn't care what language a program is written in—there's no particular reason why a program that uses the DOM couldn't be written in Java, C, BASIC, or any number of other languages. The DOM is very important because it allows programs to do useful things with XML such as finding, sorting, manipulating, and displaying information. For practical purposes, nonprogrammers do not need to worry about the technical details of DOM. Experienced programmers may find it relatively easy to use DOM to work with XML, but it's not a task for those without significant programming skills.

Non-programmers can also find uses for XML, but their options are more limited. Writing eXtensible Stylesheet Language (XSL) style sheets is a relatively easy task for anyone who has worked with any kind of style sheet, such as Cascading Style Sheets (CSS). The important difference between XSL and other types of style sheets is that XSL can perform calculations. XSL style sheets can selectively display or modify any element. For example, the style sheet in Figure 2 can be applied to the XML example in Figure 1 to create a small HTML table that displays in bold text the titles of all photographs containing my dog, along with the pictures. The location of the picture is listed on the line below the title. (See Figure 2.)

As you can see from the example, XSL is relatively simple to use. It is particularly useful for converting XML to HTML and for selectively displaying and sorting data, though it has many other uses.

The Hype vs. the Reality

It's important to recognize what XML can do and what it can't. It is very easy to find articles explaining how XML is a universal format that will eliminate the need for proprietary data formats and problems associated with converting one type of data to another. XML is definitely a significant development that has attracted a great deal of interest on the part of libraries, the technical community, and vendors. However, for historical, technical, and social reasons, it seems unlikely that XML will ever become a universal language that integrates all information resources.

All XML does is provide a standard for defining containers that store information. This is extremely useful, because it simplifies the transfer of information from one system or program to another. Like other powerful technologies such as relational databases, the Structured Query Language, the World Wide Web, and HTML, XML is only a tool. Just as a word processor cannot write an interesting article by itself, XML cannot automatically find what people want, present information in an easy-to-read format, or solve problems relating to the content of information resources.

XML can't do your work for you, but it can be transmitted, manipulated, and arranged relatively easily by programmers. As a practical matter, there will never be enough time or money to unlock the full potential of all XML-encoded data. However, it is certainly possible to develop useful applications that involve specific resources such as a journal database, the online catalog, and some local archival finding aids. XML is particularly useful for presenting the same information for different users, since a style sheet can be used to format a Web-based news service for a businessperson with a wireless palmtop computer, a blind member of the community with a talking computer, or a college student in a computer lab.

XML technologies are very flexible because XML is meant to be a general-purpose tool. However, this versatility comes at the expense of computational efficiency. For this reason, XML is poorly suited for situations where the normal tool of choice would be a database. Databases physically format the data in such a way that it can be accessed with a minimum number of disk accesses and operations in memory. For small amounts of information, such considerations aren't important. However, certain operations such as searching a million records and returning the titles written by a particular individual can only be performed by applications that have been optimized to perform such tasks.

Practical Applications

For years, libraries have been quietly using XML to perform functions such as improving access to archival materials, simplifying interlibrary loan processing, and enhancing digital collections, but increased reliance on the Internet for delivering information resources has brought XML into the mainstream, where its impact is starting to be felt by libraries of all sizes. As early as 1993, the library at the University of CaliforniaBerkeley started developing a method for encoding archival materials in XML. The outcome of this project was the development of the Encoded Archival Description (EAD) standard, which is now maintained by the Library of Congress. Use of EAD has been increasing steadily over the years as a growing number of archival finding aids have been moved to the Web.

For the past several years, individual libraries have been improving their services and saving money by developing their own XML applications. Since 1998, Oregon State University has been using an application called InterLibrary Loan Automated Search And Print (ILL ASAP) to automatically search interlibrary loan requests and print request forms sorted by location and call number, complete with availability information, scannable Ariel addresses, shipping labels (if no Ariel address is present), and billing data customized to the borrowing library or consortium involved. This free application has been adopted by dozens of libraries around the country.

More-ambitious XML projects have also been successfully implemented. The Washington Research Library Consortium uses XML to provide access to subscription databases, digital collections, materials requested via interlibrary loan, and library catalogs that run on a combination of commercial, open source, and locally developed platforms. This system, known as ALADIN (Access to Library And Database Information Network) not only delivers content to seven academic research libraries, but also performs critical related tasks such as patron authentication using XML messages transmitted between applications over the Web.

In the spring of 2002, the Library of Congress announced an official specification for representing MARC data in an XML environment, MARC XML. Even though sharing data between catalogs is relatively easy because of widespread support for the MARC format, the ability to express MARC data in XML is useful for any library trying to develop tools or access mechanisms that combine MARC data (e.g., the online catalog) with non-MARC resources (e.g., a locally maintained database or special collection). Now that a standard has emerged for representing MARC, it is reasonable to assume that vendors and others will develop tools that take advantage of the huge amount of data already stored in MARC format. As a matter of fact, within a few weeks of the announcement of LC's specification, well-known tools used for manipulating MARC records such as JAMES (Java MARC Events) and MarcEdit already contained support for the new standard.

XML's Future in Libraries

As more libraries use XML, they are finding more uses for it. The eScholarship initiative at the California Digital Library not only uses XML to store books in a standardized format, but it also uses XML technologies to allow users to define their own displays. The Open Archives Initiative (OAI), an effort supported by OCLC, has developed a protocol that makes it easy to send a query to a database over the Web and receive the results in XML. OAI effectively makes it possible to perform searches of multiple databases simultaneously without the need for proprietary hooks into local databases. While this functionality is very similar to what was promised with Z39.50, OAI is much easier to implement, so the hope is that it will be widely used for many kinds of databases. [Editor's Note: For more on OAI, see Marshall Breeding's article on page 24.]

Over the next few years, the impact of XML on libraries is certain to increase. More likely than not, it will not be obvious when XML is used to improve library services, much as it is not obvious what kind of hardware and software a library uses for its catalog. The simplicity and flexibility of XML make it possible to integrate services and resources in ways that would have been impossible just a few years ago. Vendors, libraries, and open source programmers are all interested in finding ways to search many kinds of resources with a single query, and XML represents a major step forward in making this goal a reality.

Further Reading
Those needing detailed information about specific library projects that use XML may wish to consult: Roy Tennant, ed. XML in Libraries. New York. Neal-Schuman Publishers, 2002. 175 pp.
Publication information is available at http://www.neal-schuman.com/db/0/290.html.

Kyle Banerjee is a library systems analyst for the Oregon State Library in Salem, Oregon. He has been developing XML-based applications since 1998. He has published several articles and delivered numerous presentations about applying technology in a library setting. He holds an M.A. in political science and an M.L.S., both from the University of IllinoisUrbana-Champaign. His e-mail address is banerjek@earthlink.net.

Table of Contents

Subscribe Now!

Previous Issues

ITI Home