Many technological innovations
relating to libraries have been made over the past decade, but few have
generated as much excitement as XML. While many new technologies seem more
promising before they are implemented than they are afterward (remember
how Java was going to make platform-independent software available everywhere,
and Z39.50 was going to let us find and obtain materials stored in libraries
around the globe so much more quickly?), XML is already transforming how
information is managed and delivered.
Why Are Librarians So
Excited About XML?
XML is significant because
it makes it much easier to share and search resources that are in different
formats. Until fairly recently, this wasn't much of an issue for libraries.
Historically, libraries have served as centralized repositories of information.
They purchased books, journals, films, and other information resources
on physical media, and patrons found what the library owned by consulting
a catalog that listed holdings. Most catalogs are designed with the assumption
that once a library records some descriptive information about each resource
it has purchased, this information won't have to be radically altered.
For physical resources, this works pretty well, since the authors, titles,
subjects, and physical characteristics of a book don't change.
Once access to the Internet
became widespread, it became clear that providing access to remote electronic
resources can be very problematic. Catalogs are designed to provide access
to physical resources that are under direct control of the library. However,
people want to read journal articles, books, and useful Web pages stored
in dynamically updated databases that are maintained and owned by other
organizations that might be thousands of miles away. Online library catalogs
are poorly suited for providing access to these works, so many libraries
do not include these types of resources in the catalog. As a result, it
is often very difficult for patrons to know what electronic resources they
can get through their libraries.
This is where XML comes
into the picture. It is impossible to search or display information unless
it is structured in a meaningful way. In plain English, this means that
information providers need to agree on standards for encoding electronic
documents so they can be retrieved in a uniform way. Libraries have encoded
bibliographic records in MARC for many years, and that has allowed them
to easily share catalog records, which reduces costs while improving services.
For a variety of reasons, it is not feasible to encode the new types of
resources patrons want access to in MARC. However, when the information
is stored in XML, it is possible to share and combine that data in ways
that would not otherwise be possible.
Here Are Some Things
You Need to Know
<?xml version="1.0"
encoding="UTF-8"?>
<MyPhotoArchive>
<photo title="Walking
on the Beach at Sunset" filename="http://home.earthlink.net/~banerjek/images/SunsetBeach.jpg">
<dogs>
<name>Keiko</name>
</dogs>
<people>
<name>Banerjee, Kyle</name>
</people>
<place>
<country-state>OR</country-state>
<city>Manzanita</city>
</place>
<date>1999-08-13</date>
</photo>
<photo title="Charming
snakes" filename="http://home.earthlink.net/~banerjek/images/india/Snakes.jpg">
<people>
<name>Banerjee, Kyle</name>
<name>Lincicum, Shirley</name>
</people>
<place>
<country-state>India</country-state>
<city>New Delhi</city>
</place>
<date>1998-08-26</date>
</photo>
</MyPhotoArchive>
|
Figure
1: A perfectly legitimate XML document
|
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<head>
<title>My Photo Example</title>
</head>
<body>
<table width="100%">
<xsl:for-each
select="MyPhotoArchive/photo[dogs/name='Keiko']">
<tr>
<td><b><xsl:value-of select="@title" /></b><br />
<xsl:value-of select="place/city" />,
<xsl:value-ofselect="place/country-state" />
</td>
<td>
<img>
<xsl:attribute name="src">
<xsl:value-of select="@filename" /></xsl:attribute>
</img>
</td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
|
Figure
2: An XSL style sheet
|
When most people think of the
things that can be done with XML, they are actually thinking about a family
of related technologies rather than a single markup language. On a related
note, it is better to think of XML as a grammar than as a language.
XML establishes rules for defining new formats. In regular markup languages
such as the HyperText Markup Language (HTML), authors must use certain
tags that make the text bold, create clickable links, draw tables, apply
style sheets, etc. With XML, authors are free to make up their own tags
and attributes if they don't feel that those that have been created by
others meet their needs. (See Figure 1.)
It's important to recognize
that XML only provides a structure for storing information. It does not
say anything about how information is displayed, it does not create links
that people can click on, and it doesn't bring information resources
together by itself. If
I want a Web-based photo archive that allows people to find pictures of
my dog or trip to India, I still have to write or acquire a program with
these capabilities. If someone else wants to develop an online photo archive
that integrates my pictures with those stored in other repositories, she
or he will also have to develop the software that will accomplish this
task.
Putting XML into Action
To make XML do something
useful, other technologies are necessary. The most significant of these
is the Document Object Model (DOM). The DOM is a complex subject that is
beyond the scope of a short article. In a nutshell, the DOM is an interface
between programs and an XML document. The DOM doesn't care what language
a program is written in—there's no particular reason why a program that
uses the DOM couldn't be written in Java, C, BASIC, or any number of other
languages. The DOM is very important because it allows programs to do useful
things with XML such as finding, sorting, manipulating, and displaying
information. For practical purposes, nonprogrammers do not need to worry
about the technical details of DOM. Experienced programmers may find it
relatively easy to use DOM to work with XML, but it's not a task for those
without significant programming skills.
Non-programmers can also
find uses for XML, but their options are more limited. Writing eXtensible
Stylesheet Language (XSL) style sheets is a relatively easy task for anyone
who has worked with any kind of style sheet, such as Cascading Style Sheets
(CSS). The important difference between XSL and other types of style sheets
is that XSL can perform calculations. XSL style sheets can selectively
display or modify any element. For example, the style sheet in Figure 2
can be applied to the XML example in Figure 1 to create a small HTML table
that displays in bold text the titles of all photographs containing my
dog, along with the pictures. The location of the picture is listed on
the line below the title. (See Figure 2.)
As you can see from the
example, XSL is relatively simple to use. It is particularly useful for
converting XML to HTML and for selectively displaying and sorting data,
though it has many other uses.
The Hype vs. the Reality
It's important to recognize
what XML can do and what it can't. It is very easy to find articles explaining
how XML is a universal format that will eliminate the need for proprietary
data formats and problems associated with converting one type of data to
another. XML is definitely a significant development that has attracted
a great deal of interest on the part of libraries, the technical community,
and vendors. However, for historical, technical, and social reasons, it
seems unlikely that XML will ever become a universal language that integrates
all information resources.
All XML does is provide
a standard for defining containers that store information. This is extremely
useful, because it simplifies the transfer of information from one system
or program to another. Like other powerful technologies such as relational
databases, the Structured Query Language, the World Wide Web, and HTML,
XML is only a tool. Just as a word processor cannot write an interesting
article by itself, XML cannot automatically find what people want, present
information in an easy-to-read format, or solve problems relating to the
content of information resources.
XML can't do your work for
you, but it can be transmitted, manipulated, and arranged relatively easily
by programmers. As a practical matter, there will never be enough time
or money to unlock the full potential of all XML-encoded data. However,
it is certainly possible to develop useful applications that involve specific
resources such as a journal database, the online catalog, and some local
archival finding aids. XML is particularly useful for presenting the same
information for different users, since a style sheet can be used to format
a Web-based news service for a businessperson with a wireless palmtop computer,
a blind member of the community with a talking computer, or a college student
in a computer lab.
XML technologies are very
flexible because XML is meant to be a general-purpose tool. However, this
versatility comes at the expense of computational efficiency. For this
reason, XML is poorly suited for situations where the normal tool of choice
would be a database. Databases physically format the data in such a way
that it can be accessed with a minimum number of disk accesses and operations
in memory. For small amounts of information, such considerations aren't
important. However, certain operations such as searching a million records
and returning the titles written by a particular individual can only be
performed by applications that have been optimized to perform such tasks.
Practical Applications
For years, libraries have
been quietly using XML to perform functions such as improving access to
archival materials, simplifying interlibrary loan processing, and enhancing
digital collections, but increased reliance on the Internet for delivering
information resources has brought XML into the mainstream, where its impact
is starting to be felt by libraries of all sizes. As early as 1993, the
library at the University of CaliforniaBerkeley started developing
a method for encoding archival materials in XML. The outcome of this project
was the development of the Encoded Archival Description (EAD) standard,
which is now maintained by the Library of Congress. Use of EAD has been
increasing steadily over the years as a growing number of archival finding
aids have been moved to the Web.
For the past several years,
individual libraries have been improving their services and saving money
by developing their own XML applications. Since 1998, Oregon State University
has been using an application called InterLibrary Loan Automated Search
And Print (ILL ASAP) to automatically search interlibrary loan requests
and print request forms sorted by location and call number, complete with
availability information, scannable Ariel addresses, shipping labels (if
no Ariel address is present), and billing data customized to the borrowing
library or consortium involved. This free application has been adopted
by dozens of libraries around the country.
More-ambitious XML projects
have also been successfully implemented. The Washington Research Library
Consortium uses XML to provide access to subscription databases, digital
collections, materials requested via interlibrary loan, and library catalogs
that run on a combination of commercial, open source, and locally developed
platforms. This system, known as ALADIN (Access to Library And Database
Information Network) not only delivers content to seven academic research
libraries, but also performs critical related tasks such as patron authentication
using XML messages transmitted between applications over the Web.
In the spring of 2002, the
Library of Congress announced an official specification for representing
MARC data in an XML environment, MARC XML. Even though sharing data between
catalogs is relatively easy because of widespread support for the MARC
format, the ability to express MARC data in XML is useful for any library
trying to develop tools or access mechanisms that combine MARC data (e.g.,
the online catalog) with non-MARC resources (e.g., a locally maintained
database or special collection). Now that a standard has emerged for representing
MARC, it is reasonable to assume that vendors and others will develop tools
that take advantage of the huge amount of data already stored in MARC format.
As a matter of fact, within a few weeks of the announcement of LC's specification,
well-known tools used for manipulating MARC records such as JAMES (Java
MARC Events) and MarcEdit already contained support for the new standard.
XML's Future in Libraries
As more libraries use XML,
they are finding more uses for it. The eScholarship initiative at the California
Digital Library not only uses XML to store books in a standardized format,
but it also uses XML technologies to allow users to define their own displays.
The Open Archives Initiative (OAI), an effort supported by OCLC, has developed
a protocol that makes it easy to send a query to a database over the Web
and receive the results in XML. OAI effectively makes it possible to perform
searches of multiple databases simultaneously without the need for proprietary
hooks into local databases. While this functionality is very similar to
what was promised with Z39.50, OAI is much easier to implement, so the
hope is that it will be widely used for many kinds of databases. [Editor's
Note: For more on OAI, see Marshall Breeding's article on page 24.]
Over the next few years,
the impact of XML on libraries is certain to increase. More likely than
not, it will not be obvious when XML is used to improve library services,
much as it is not obvious what kind of hardware and software a library
uses for its catalog. The simplicity and flexibility of XML make it possible
to integrate services and resources in ways that would have been impossible
just a few years ago. Vendors, libraries, and open source programmers are
all interested in finding ways to search many kinds of resources with a
single query, and XML represents a major step forward in making this goal
a reality.
Further Reading
Those needing detailed information
about specific library projects that use XML may wish to consult: Roy Tennant,
ed. XML in Libraries. New York. Neal-Schuman Publishers, 2002. 175
pp.
Publication information
is available at http://www.neal-schuman.com/db/0/290.html.
|
|