While flying into Portland, Oregon, for the second Joint Conference
on Digital Libraries (JCDL 2002), I realized I was feeling anxious. I'd
never before been to an event organized by the Association for Computing
Machinery (ACM) and IEEE. The people who would be presenting were academics.
I knew a few of them by name or reputation, but most were strangers. I
was afraid the papers would be completely theoretical, full of mathematical
formulas, and way over my head. I was prepared to be intimidated and joked
before leaving my office that if there were too many formulas, I was walking
out.
It's been common knowledge for the past 9 months that conference attendance
in all disciplines is down. The reasons cited are travel fears and the
economic slowdown.Apparently neither has had an effect on those interested
in digital libraries. This year's event actually attracted more people
than last year's—about 450 from 19 countries. It's a tribute not only to
the power of digital libraries but also to the organizers at the Oregon
Health & Science University. They include general chair William Hersh
and Lynetta Sacherek, the event's treasurer, local arrangements organizer,
and all-around troubleshooter extraordinaire. The program chair, Gary Marchionini,
from the University of North CarolinaChapel Hill, was unfortunately
unable to attend, but sent remarks by video.
Copyright Issues
Although I worried that the practical would be overlooked in a flurry
of high-technology, blue-sky prognostications, the first day's opening
speaker reassured me. Jessica Litman, a law professor at Wayne State University
and author of Digital Copyright, has a very clear view of copyright
from both the librarian and computer scientist perspectives. She began
by noting that electronic information is dynamic, ubiquitous, and shared.
Since the Internet has made searching quick and easy, people are finding
that it's fun to look things up and share what they discover. This transforms
information space and leads to public policy issues surrounding the archiving
of constantly changing information, evaluating the quality ofWeb information,
and preserving privacy and anonymity.
Litman, in both her talk and her answers to questions, kept repeating
that copyright is frustrating. One of her primary frustrations: The Constitution
says that copyright exists to advance and spread knowledge, but recent
interpretations of copyright law are aimed at preventing knowledge sharing.
She advised all coders in the audience against designing copyright protections
into digital libraries. First of all, even lawyers aren't sure what the
rules are. This makes it impossible for a coder to get it right. Second,
we don't know what the courts may rule going forward. "If we code to protect
information, it destroys wild information. It does violence to how people
interact with information," Litman said. She offered the analogy of checking
out a book from the library: "The librarian doesn't phone the publisher
to ask permission." Ending with a call to optimize digital libraries for
storage and information use rather than copyright, Litman reminded us,
"There are no certainties in copyright law."
Other Presentations
Following the opening session, the conference split into three concurrent
sessions, two of which featured paper presentations. The third was billed
as a panel discussion. I found this slightly confusing, as the panel members
usually presented a paper, a few of which are in the proceedings.
I was not particularly impressed by the papers in the "Summarization
and Question Answering" session. The University of Arizona's TXTRACTOR
project utilizes sentence-selection heuristics to rank text segments. Daniel
McDonald explained how his group uses artificial intelligence to create
indicative summaries and talked about why he thinks text extraction is
superior to an abstract. He didn't convince me.
David Pinto's presentation about QuASM (pronounced "chasm") did have
one screen of mathematical formulas, but I stayed put. The notion of applying
artificial intelligence techniques to numeric tables was intriguing, but
Pinto's group at the University of Massachusetts has a long way to go in
understanding the nuances of tabular data. Although the members are working
with a large database, the questions they ask it derive from the data,
not from practical need. In response to a query from the audience, Pinto
acknowledged that they couldn't factor in data expressed in thousands.
In my work, there's a big difference between 20,000 and 20 million.
I was happier with the afternoon sessions "Studying Users" and "Classification
and Browsing." Although it seems intuitively obvious that people who are
familiar with a topic will approach a search engine differently than those
who are unfamiliar with it, the work of Diane Kelly and Colleen Cool—which
they presented during "The Effects of Topic Familiarity on Information
Search Behavior"—confirms that. As a side benefit, the presentation taught
me that you can't always trust people to self-evaluate their level of expertise.
Some problems are perennial. How to find relevant documents is a topic
that has bedeviled searchers for decades. Researchers at Scotland's Robert
Gordon University School of Computing have developed SmartSkim, a tool
that uses a histogram to scan lengthy documents and identify relevant portions
within them. Although this group's paper contained some math that was,
as I had feared, over my head, the notion of relevance profiling was intriguing.
I look forward to further research in this area.
A panel discussion titled "Biodiversity and Biocomplexity Informatics"
caught my eye. As this is a topic outside my area of expertise, I was eager
to understand its relevance to digital libraries as well as the issues
behind it. The panel talked about how data collected for one purpose ended
up being used for another. Tracking biodiversity on a worldwide basis has
public policy implications that affect both the "citizen scientist" and
the academic scientific community. I was particularly captivated by Larry
Speers' explanation of the history and work of the Global Biodiversity
Information Facility, which is headquartered at the University of Copenhagen's
Zoological Museum. Stanley Blum's presentation on taxonomies for natural
history museums made clear its incredibly broad mission: to document life
on earth.
Many of the presentations showed digital special collections rather
than full digital libraries. These were particularly evident in the Miguel
de Cervantes Digital Library from Spain's University ofAlicante, the National
Archives of Singapore's collection, and the U.S. Geological Survey's Marine
Realms Information Bank. All these were on display in the "Demonstrations
and Posters" session, which was held on the first day of the conference.
Interesting Uses
Other subject areas for digital libraries were equally eclectic. Constructing
a vertical search engine for nanotechnology (an ongoing project at the
University of Arizona) turns out to be more complex than you would think.
Researchers approach the topic from different perspectives, use different
languages, and expect the technology to have diverse applications. Two
products are being tested; one is server-side-based and the other client-side-based.
So far, research indicates that the former is more efficient.
Search interfaces and techniques always capture my attention. I found
some innovative ideas and a few repeats of techniques that have been in
use for some time. Cultural concerns were something new to me. For example,
the Maori people of New Zealand have a concept of information flow that
a text-based digital library can't accommodate. The presentation on search
facilities for Internet Relay Chat made me wonder if it would make corporations
and government agencies nervous. The courts have been looking at e-mails—forcing
businesses and government departments to save them—but so far they haven't
touched on instant messaging.
Closing Keynote
The final day's keynote speech by Daniel Greenstein, director of the
California Digital Library, was both entertaining and educational. Saying
that digital libraries create demand for traditional services, he emphasized
the need for traditional skills to be deployed in new ways. Digitization
is a means to an end, he reminded us, joking, "If it works, it's not research."
He gave a few concrete examples of programs that have turned into building
blocks: the National Science Digital Library in the U.S. and JISC in the
U.K. The preservation aspects of digital libraries are important. It's
not just access; it's having a collection to access.
Final Thoughts
Interestingly, I didn't find representatives of commercial search companies
among the attendees. There was no one from old-style search organizations
such as Dialog or LexisNexis, or from newer companies like Google or AltaVista.
To be fair, there were participants from OCLC and IBM Almaden Labs. Plus,
some very well-known search engines have sprung from university research
projects—Carnegie Mellon University's Lycos is a prime example. Carnegie
Mellon sent a number of people to the event.
I found the conference much less intimidating and more accessible for
my practical librarian brain than I had anticipated. Going to sessions
outside my comfort level—I'm thinking particularly of those on music digital
libraries—was an invigorating experience. Learning what academic researchers
are doing suggests what future real-world products will look like. I was
impressed with the atmosphere of camaraderie as well as the willingness
of participants to ask questions of presenters and poke holes in their
research, all in a collegial fashion. I only hope that these academics
keep in mind the practical applications of their work and don't waste their
time reinventing wheels we've been rolling along on for years.
Digital libraries are growing in importance. They are becoming mainstream
and part of the overall fabric of our information world. This conference
hit all the high points of research and development by detailing both research
in progress and finished projects. JCDL is a worthwhile event to attend,
even if you're not an academic researcher.
Marydee Ojala is editor of ONLINE magazine, conference chair
of the National Online conference, and a longtime business researcher.
Her e-mail address is marydee@infotoday.com. |