IT Report from the Field: Joint Conference on Digital Libraries 2002

Volume 19, Issue 8 — September 2002

Table of Contents

Previous Issues

Subscribe Now!

ITI Home

• IT Report from the Field •
Joint Conference on Digital Libraries 2002
Despite its heavy academic overtones, this event offered a lot of practical information
by Marydee Ojala

While flying into Portland, Oregon, for the second Joint Conference on Digital Libraries (JCDL 2002), I realized I was feeling anxious. I'd never before been to an event organized by the Association for Computing Machinery (ACM) and IEEE. The people who would be presenting were academics. I knew a few of them by name or reputation, but most were strangers. I was afraid the papers would be completely theoretical, full of mathematical formulas, and way over my head. I was prepared to be intimidated and joked before leaving my office that if there were too many formulas, I was walking out.

It's been common knowledge for the past 9 months that conference attendance in all disciplines is down. The reasons cited are travel fears and the economic slowdown.Apparently neither has had an effect on those interested in digital libraries. This year's event actually attracted more people than last year's—about 450 from 19 countries. It's a tribute not only to the power of digital libraries but also to the organizers at the Oregon Health & Science University. They include general chair William Hersh and Lynetta Sacherek, the event's treasurer, local arrangements organizer, and all-around troubleshooter extraordinaire. The program chair, Gary Marchionini, from the University of North CarolinaChapel Hill, was unfortunately unable to attend, but sent remarks by video.

Copyright Issues
Although I worried that the practical would be overlooked in a flurry of high-technology, blue-sky prognostications, the first day's opening speaker reassured me. Jessica Litman, a law professor at Wayne State University and author of Digital Copyright, has a very clear view of copyright from both the librarian and computer scientist perspectives. She began by noting that electronic information is dynamic, ubiquitous, and shared. Since the Internet has made searching quick and easy, people are finding that it's fun to look things up and share what they discover. This transforms information space and leads to public policy issues surrounding the archiving of constantly changing information, evaluating the quality ofWeb information, and preserving privacy and anonymity.

Litman, in both her talk and her answers to questions, kept repeating that copyright is frustrating. One of her primary frustrations: The Constitution says that copyright exists to advance and spread knowledge, but recent interpretations of copyright law are aimed at preventing knowledge sharing. She advised all coders in the audience against designing copyright protections into digital libraries. First of all, even lawyers aren't sure what the rules are. This makes it impossible for a coder to get it right. Second, we don't know what the courts may rule going forward. "If we code to protect information, it destroys wild information. It does violence to how people interact with information," Litman said. She offered the analogy of checking out a book from the library: "The librarian doesn't phone the publisher to ask permission." Ending with a call to optimize digital libraries for storage and information use rather than copyright, Litman reminded us, "There are no certainties in copyright law."

Other Presentations
Following the opening session, the conference split into three concurrent sessions, two of which featured paper presentations. The third was billed as a panel discussion. I found this slightly confusing, as the panel members usually presented a paper, a few of which are in the proceedings.

I was not particularly impressed by the papers in the "Summarization and Question Answering" session. The University of Arizona's TXTRACTOR project utilizes sentence-selection heuristics to rank text segments. Daniel McDonald explained how his group uses artificial intelligence to create indicative summaries and talked about why he thinks text extraction is superior to an abstract. He didn't convince me.

David Pinto's presentation about QuASM (pronounced "chasm") did have one screen of mathematical formulas, but I stayed put. The notion of applying artificial intelligence techniques to numeric tables was intriguing, but Pinto's group at the University of Massachusetts has a long way to go in understanding the nuances of tabular data. Although the members are working with a large database, the questions they ask it derive from the data, not from practical need. In response to a query from the audience, Pinto acknowledged that they couldn't factor in data expressed in thousands. In my work, there's a big difference between 20,000 and 20 million.

I was happier with the afternoon sessions "Studying Users" and "Classification and Browsing." Although it seems intuitively obvious that people who are familiar with a topic will approach a search engine differently than those who are unfamiliar with it, the work of Diane Kelly and Colleen Cool—which they presented during "The Effects of Topic Familiarity on Information Search Behavior"—confirms that. As a side benefit, the presentation taught me that you can't always trust people to self-evaluate their level of expertise.

Some problems are perennial. How to find relevant documents is a topic that has bedeviled searchers for decades. Researchers at Scotland's Robert Gordon University School of Computing have developed SmartSkim, a tool that uses a histogram to scan lengthy documents and identify relevant portions within them. Although this group's paper contained some math that was, as I had feared, over my head, the notion of relevance profiling was intriguing. I look forward to further research in this area.

A panel discussion titled "Biodiversity and Biocomplexity Informatics" caught my eye. As this is a topic outside my area of expertise, I was eager to understand its relevance to digital libraries as well as the issues behind it. The panel talked about how data collected for one purpose ended up being used for another. Tracking biodiversity on a worldwide basis has public policy implications that affect both the "citizen scientist" and the academic scientific community. I was particularly captivated by Larry Speers' explanation of the history and work of the Global Biodiversity Information Facility, which is headquartered at the University of Copenhagen's Zoological Museum. Stanley Blum's presentation on taxonomies for natural history museums made clear its incredibly broad mission: to document life on earth.

Many of the presentations showed digital special collections rather than full digital libraries. These were particularly evident in the Miguel de Cervantes Digital Library from Spain's University ofAlicante, the National Archives of Singapore's collection, and the U.S. Geological Survey's Marine Realms Information Bank. All these were on display in the "Demonstrations and Posters" session, which was held on the first day of the conference.

Interesting Uses
Other subject areas for digital libraries were equally eclectic. Constructing a vertical search engine for nanotechnology (an ongoing project at the University of Arizona) turns out to be more complex than you would think. Researchers approach the topic from different perspectives, use different languages, and expect the technology to have diverse applications. Two products are being tested; one is server-side-based and the other client-side-based. So far, research indicates that the former is more efficient.

Search interfaces and techniques always capture my attention. I found some innovative ideas and a few repeats of techniques that have been in use for some time. Cultural concerns were something new to me. For example, the Maori people of New Zealand have a concept of information flow that a text-based digital library can't accommodate. The presentation on search facilities for Internet Relay Chat made me wonder if it would make corporations and government agencies nervous. The courts have been looking at e-mails—forcing businesses and government departments to save them—but so far they haven't touched on instant messaging.

Closing Keynote
The final day's keynote speech by Daniel Greenstein, director of the California Digital Library, was both entertaining and educational. Saying that digital libraries create demand for traditional services, he emphasized the need for traditional skills to be deployed in new ways. Digitization is a means to an end, he reminded us, joking, "If it works, it's not research." He gave a few concrete examples of programs that have turned into building blocks: the National Science Digital Library in the U.S. and JISC in the U.K. The preservation aspects of digital libraries are important. It's not just access; it's having a collection to access.

Final Thoughts
Interestingly, I didn't find representatives of commercial search companies among the attendees. There was no one from old-style search organizations such as Dialog or LexisNexis, or from newer companies like Google or AltaVista. To be fair, there were participants from OCLC and IBM Almaden Labs. Plus, some very well-known search engines have sprung from university research projects—Carnegie Mellon University's Lycos is a prime example. Carnegie Mellon sent a number of people to the event.

I found the conference much less intimidating and more accessible for my practical librarian brain than I had anticipated. Going to sessions outside my comfort level—I'm thinking particularly of those on music digital libraries—was an invigorating experience. Learning what academic researchers are doing suggests what future real-world products will look like. I was impressed with the atmosphere of camaraderie as well as the willingness of participants to ask questions of presenters and poke holes in their research, all in a collegial fashion. I only hope that these academics keep in mind the practical applications of their work and don't waste their time reinventing wheels we've been rolling along on for years.

Digital libraries are growing in importance. They are becoming mainstream and part of the overall fabric of our information world. This conference hit all the high points of research and development by detailing both research in progress and finished projects. JCDL is a worthwhile event to attend, even if you're not an academic researcher.

Marydee Ojala is editor of ONLINE magazine, conference chair of the National Online conference, and a longtime business researcher. Her e-mail address is marydee@infotoday.com.

Table of Contents

Previous Issues

Subscribe Now!

ITI Home