Volume 15, Number 6 June 1998 |
Libraries have always been repositories of existing information. They gather it, select and organize it, make it accessible, and preserve it. The vision of digital libraries that emerges from this conference is far beyond the scope of current libraries. It places the library squarely in the middle of the information-creation business. Users will be able to use new information tools to combine pieces of information and to create new ones.
Imagine taking maps and overlaying them one on another to create a 3-D virtual fly-through of a marshland. Walk inside a molecule. Sort through newscasts that are searched by an automatically created text database of the spoken text. Then choose a news segment to view on your screen, starting precisely at the spot that discusses your topic.
These are no longer fantasies. They are the results of 4 years of research, and they will change profoundly how libraries work and how librarians think of themselves. The goal of these projects was to produce both research and a usable collection and product. Based on the presentations I saw in Santa Barbara, they have succeeded.
In addition to creating information tools, however, these projects were instrumental in initiating debates on societal issues, such as intellectual property, physical vs. digital forms of materials, how people use information, what information packages should look like, and what tools should be developed to manipulate their contents. What will the impact be on existing institutions, and on society? How will scholars choose to communicate if they can either send out research results today or have them appear in a paper peer-reviewed journal 2 years from now? These are not easy choicesfor libraries, for publishers, or for scholars.
Other issues include the demand for better infrastructure, rather than for new buildings; professors' fears that their access to physical materials will dwindle; the increases in both cost and amount of information libraries must handle; how libraries can plan for change; and how they must budget for new technologies, as well as integrate them into the current process. Of particular concern is the chaos in the marketplace, with established publishers "throwing out trial balloons" for changes in price and subscription models. And finally, Lucier noted that academic institutions are inherently conservative, and not kind to innovators.
The California Digital Library will open its virtual doors by this fall. In the process of building this library, its creators have reached some interesting conclusions:
The California Digital Library will have a collection of high-quality digital materials. It will offer integrated tools for information delivery, including tools to create, share, manipulate, store, and use knowledge, all sharing a consistent interface. It will include published literature and digital-only literature, including scientific data sets and special collections. One interesting sidelight: They must develop licensing agreements to their own collections, as well as negotiate agreements for access with publishers of digital information.
One of the threads running through this conference was the need for standards. For libraries conscious of the need for permanent archives, this is a knotty problem. How should materials be described so they can be retrieved? What happens if the hardware on which materials are viewed changes? Can we port documents to new platforms without loss of integrity? How do we develop robust, reliable tools that can be shared among organizations, and evaluate their effectiveness?
The University of Michigan has created a digital library for science in the schools, grades six through nine. The project was a vehicle for investigating the use of ontologies and intelligent agents, as well as for creating economic incentives for use. They now have an operating model that includes age-appropriate materials, as well as tools for working with the information. Visit it at http://mydl.soe.umich.edu.
Carnegie Mellon's Informedia Digital Video Library entranced me with its possibilities. As far as I can see, if intellectual property problems don't derail it, this is the "killer application" for digital libraries. I say this not only because the application has such strong appeal, but because the Carnegie Mellon group has the right idea about how to approach creating an information application. Informedia is a system that takes video presentations, such as newscasts, and creates a searchable video collection from them automatically.
They do this by combining several kinds of information technologies in order to extract all levels of information from the original. They use speech recognition and natural language processing to create a text database from the narrative parts of the video. They capture words that appear on the screen, but aren't spoken, such as names of speakers. Image recognition identifies people or scenery. The user interface has many ways to identify and ask for what you need. Search on a typed-in query, plus the face of your subject. Get back what looks like a set of thumbnail images that represent each video clip, with a visual relevance ranking in the form of a colored columnthe higher the column, the greater the relevance. Examine a timeline of the clip. The query words are each identified by a different color wherever they appear on the timeline, so that you can find where they cluster, and start viewing from that point.
None of these is a perfect technology, as the developers are the first to point out. But, the combination of imperfect technologies, each yielding some searchable clues, results in surprisingly good retrieval. The tools that they have developed to go with it are fun to use, and make finding information quite easy. Try it at http://informedia.cs.cmu.edu.
While Stanford University's project, Infobus, may not have the glamour of Informedia, it may be the strong foundation that will make digital libraries work, and work together. Stanford's goal was to create a modular structure that would allow a digital library to plug in heterogeneous modules without entirely rewriting the system to accommodate them. Each module has a uniform "wrapper" that allows it to interact with the system as a whole. Thus, you could plug in modules, such as DIALOG, FOLIO, DigiCash, and an image database. The system takes a query from the user, decides which resource is the best place to find an answer, and sends the query, appropriately structured, to that module. Answers are translated back into the common interface and presented to the user. Imagine searching all online systems through a single interface.
The Alexandria Digital Library at the University of California at Santa Barbara was one of the first to create a usable product. Its geographic information system now has a new Java interface. It searches all kinds of "spatially referenced" information: maps, satellite images, and digital elevation models. It can find a map of a city of over 5,000 people in the Mississippi Valley, which has nearby Indian burial sites, that shows the road network. It can find associated 19th century photographs. The system is modular, so that as new technologies are developed, they can be added. Leave plenty of time to explore this at http://www.alexandria.ucsb.edu.
The University of California at Berkeley concentrated on developing tools for using very large collections of digital data. Its goal was to create a system and tools that would make the work cycle more efficient for the user. Research was required to understand how information finding and use fit within the work cycle of the user, so they analyzed how land use planners, environmentalists, or zoologists use information. With that starting point, they designed several interesting systems. CalFlora is a database of pictures and information about California's plants. Blobworld finds pictures based on areas of color and shape. Less glitzy, but extremely useful, they have created document recognizers for some types of documents, such as tables, and then used them to perform such feats as creating a database from a table, and attaching it to a map to create new documents. Multivalent documents construct layers on top of the original, so that all versions can exist simultaneously (http://elib.cs.berkeley.edu).
Rutgers University's datamining project, described by Nabil Adam, created a data warehouse and analysis tools for decision making. It consists of a state-of-the-art environmental monitoring system plus satellite images, maps, including geological survey maps, and visualization techniques. One of the most impressive tools combines several maps into one, and then creates a virtual fly-through of an area.
While this presentation was a too-brief overview, it is worth examining in detail the analysis King and Tenopir have done to help libraries predict their break-even points for subscribing to a journal vs. getting copies on demand. They have written a number of articles on the subject with helpful guidelines for libraries contending with rising journal prices.
Gary Marchionini maintained that digital library design depends on the users, on the content and how they will use it, and on their tasks. Just as special libraries and public libraries differ, so too must digital libraries serving disparate populations. And, the problem is even harder with digital libraries, since in that case the user populations are unknown. He suggested creating alternative interfaces, depending on both user preferences, and such easy-to-detect elements as type of equipment and bandwidth available to the individual. In addition, systems must build in help, to teach the user as he or she learns the system.
In the last 5 years, we have seen digital libraries grow from dream to reality. In the process, the field has attracted fine minds trying to solve nicely complex problems of information storage, presentation, access, and use. This ferment makes for a lively debate. Who could ever have thought libraries were stodgy?
Advances in Digital Libraries was sponsored by the IEEE Computer Society, NASA/Goddard Flight Center, the Library of Congress, the National Library of Medicine, the Alexandria Digital Library, CESDIS, Hughes Aircraft, and IBM.
Susan Feldman is owner and president of Datasearch, an information consulting firm specializing in digital libraries and search engines. She can be reached at sef2@cornell.edu.
Table of Contents | Information Today Home Page |