Library discovery has changed and improved greatly over the years. For a very long time, libraries had print catalogs in which everything the library owned showed up. With the advent of the web, libraries moved their catalogs online. Now, most libraries offer web-scale discovery for both physical and online resources that they own or license using various vended or open source solutions, sometimes using a combination of these.
Web-scale discovery refers to “a pre-harvested central index coupled with a richly featured discovery layer providing a single search across a library’s local, open access, and subscription collections,” according to Athena Hoeppner (“The Ins and Outs of Evaluating Web-Scale Discovery Services,” Computers in Libraries, April 2012; infotoday.com/cilmag/apr12/Hoeppner-Web-Scale-Discovery-Services.shtml). Although library search at web-scale is taken for granted these days, it wasn’t until the late 2000s that libraries began to adopt and offer commercial web-scale discovery tools for their users. WorldCat Local started in 2007, followed by Summon in 2009, and then EBSCO Discovery Service, Encore Synergy, and Primo in 2010.
Using web-scale discovery tools, library patrons can search for and discover library resources in many different formats, such as a physical book, an online journal, a CD, or a reference database on the web, in a relatively seamless manner. Most of all, web-scale discovery enables article-level search in a single search interface. Before the adoption of web-scale discovery, in order to search for and access articles, library patrons had to use many different interfaces of various online databases and journals or a federated search tool that sent out queries to multiple databases and retrieved the results, which was often a slow and buggy process.
Web-scale library discovery can be seen as an attempt to provide library patrons with a search tool similar to an internet search engine (ISE) like Google, which provides page-level discovery. But unlike web search engines that mostly deal with web documents, library discovery tools must handle many more heterogeneous materials, ranging from monographs, serials, and newspapers to maps, DVDs, and CDs. They come in various formats, from physical and digital to local and web, and free and licensed. This makes library discovery vastly more complicated. How does today’s web-scale discovery for libraries fare in library patron research? What is ahead for library search and discovery?
Search Context
Librarians at academic libraries often hear from their users that they begin their research with a Google search rather than a search in the library’s web-scale discovery page. When asked why, a common response is that Google works better. What do they mean by “better”? Let’s try "heart attack" as an example. If you search for this term in Google, a paragraph from the CDC (Centers for Disease Control and Prevention) appears on the top. Directly below, searchers find the “People also ask” section for commonly asked questions about heart attacks, which are basically FAQs. On the top right part of the page, Google uses its Knowledge Graph to present an encyclopedia-like overview about heart attacks. The middle of the search results page displays more results about its symptoms, causes, treatments, and other information from sources such as Mayo Clinic, MedlinePlus, and NIH (National Institutes of Health). This is followed by news stories grouped as “Top stories” about heart attacks. Near the bottom of the page is “Related searches,” which lists similar search queries such as “what causes a heart attack” and “heart attack first aid.”
Running the same search in a library’s web-scale discovery system yields search results that are less organized and probably more confusing to users. Books, articles, and materials of other formats are all thrown together and presented at the same time. The screen indicates that the search results have been sorted by relevance. In reality, however, it would be hard for searchers to make sense of the ranking and guess what factors went into the criteria of relevance. In a bento-box-type library search interface, search results are categorized according to their sources, such as library catalog, articles, databases, online journals, archival collections, and library guides. But it is not clear whether the bento-box interface renders search results in a more manageable and understandable manner.
This is just one example. This comparison cannot be easily generalized. But it is clear that Google’s search result display tries to capture as many common search contexts as possible for a given search term by categorizing and arranging results into various groups. Some libraries’ bento-box-type discovery interfaces also try to achieve this to some degree by organizing item records according to their respective source; web-scale discovery products rely more heavily on the facets that users need to actively peruse and select in order to deal with the large number of items in the search results they get.
More Manageable and Understandable Discovery
Around the globe, research output is being produced at a faster and faster pace. As a result, the volume of resources that libraries acquire has also drastically increased. However, our individual capacity to digest and process newly formed knowledge does not increase at the same rate. Thus, addressing the issue of how to make the search and discovery of and access to a large volume of knowledge manageable and understandable to users becomes an even more critical task to libraries.
The adoption of web-scale discovery has allowed libraries to offer article-level discovery in a single search box. But libraries’ web-scale discovery solutions have not fully succeeded in making the discovery of and access to a large volume of knowledge manageable and understandable to users. Why does Google seem to perform so much better at this than web-scale discovery solutions for libraries when the number of items to sort through is much greater? Why do Google’s search results appear to make more sense in terms of their relevance to a given search query than those from web-scale discovery solutions used by libraries? Google’s page rank algorithm relies much more on full-text search, while libraries’ web-scale discovery solutions mostly utilize structured metadata from bibliographic records, article abstracts, and indexes with controlled vocabularies. According to research by Jimmy Lin, full-text search, particularly paragraph-level search, is consistently more effective than abstract-only search (“Is Searching Full Text More Effective Than Searching Abstracts?” BMC Bioinformatics, vol. 10, no. 1. Feb. 3, 2009: pp. 46–60; doi.org/10.1186/1471-2105-10-46).
This poses an interesting question to libraries: Is the web search experience the ultimate goal of library discovery? If so, when the performance of library search and discovery falls behind that of web search engines, should libraries then simply accept that users’ research will mostly take place using web search engines and that library discovery solutions will come into play only afterward, when users need to find a copy from the library? Considering how long libraries have been striving to provide one-stop searching that can serve as the starting point of their users’ research, this feels like an end to that ambition.
Library Discovery vs. Internet Search Engines
It is to be noted that ISEs and libraries regard information and knowledge quite differently. ISEs display information in a ranked order in which evaluation is inevitably embedded. In Algorithms of Oppression: How Search Engines Reinforce Racism (New York: NYU Press, 2018), Safiya Umoja Noble pointed out that while users employ the simplest queries they can in a search box, this does not always reflect how search terms are mapped against more complex thought patterns and concepts that searchers have about a topic. This disjunction between, on the one hand, users’ queries and their real questions, and, on the other hand, information retrieval systems, makes understanding the complex linkages between the content of the results that appear in a search and their import as expressions of power and social relations of critical importance. The seeming simplicity of a single search box in ISEs masks the complexity of the evaluation process performed by ISEs behind the scenes.
ISEs are ultimately an advertising platform, and as such, they commodify information. This includes information about the identities of people and groups. As an advertising platform that ranks webpages based upon popularity, ISEs also reinforce existing ideologies and biases. This is why a search query for girls or women of a particular ethnicity tends to result in pornographic content, or that the results of the search query on Black-on-white crime may rank white supremacist propaganda much higher than accurate statistics data that refute it.
Nevertheless, people rarely question the motive behind the search results they see and use every day and are inclined to believe ISEs to be mostly neutral and credible. It is worthwhile to note that an ISE’s interests in search relevance lies mostly to the degree in which it can optimize its ad performance and revenue. After all, ISEs are not a public resource for credible, accurate, and neutral information. While the accuracy of information and the credibility of websites may be important to ISEs, the greatest priority in their page-ranking algorithms is to get as many clicks as possible for their ads. In this respect, library discovery stands at the farthest point from web discovery.
Libraries often try to bring the user experience of their library discovery close to that of ISE discovery by customizing a user interface and tweaking relevancy ranking. But it is often forgotten that this process can lead to treating information and knowledge as things to be simply handed over to library patrons, who are viewed as passive consumers. Under this kind of understanding, the discovery of information and knowledge and library patrons’ learning are reduced to transactional one-time events.
However, unlike ISEs’ search and discovery, library search and discovery is not driven by a profit motive. Nor does its purpose lie in optimizing ad performance or maximizing clicks. Library discovery retrieves and provides access to information resources with the goal of enabling people to critically evaluate and use them to make better decisions for themselves and their lives as an individual as well as a member of society. After all, libraries teach and promote information literacy, which involves developing a critical consciousness about information and learning to ask questions about the role of libraries and other institutions in structuring and presenting a single, knowable reality. In this context, information and knowledge are best understood as the product of socially negotiated epistemological processes and the raw material for the further making of new knowledge, as James Elmborg posits in “Critical Information Literacy: Implications for Instructional Practice” (The Journal of Academic Librarianship, vol. 32, no. 2, March 2006: pp. 192–199; doi.org/10.1016/j.acalib.2005.12.004).
Critical information literacy and the purpose it serves—informed deliberation, decision, and action—should be regarded as essential elements in library discovery. This is also why malicious or false information and relevancy rankings based upon unexamined biases, often found in ISEs’ search results, are not compatible with library discovery. Library patrons should be viewed as active civic, political, and moral agents. Knowledge and information should be regarded as catalysts for people’s civic, political, and moral practice, not something to be simply handed over to them to consume. Library search and discovery needs to be reconceptualized as a way to educate information users to help them realize their civic, political, and moral agency to its fullest potential.
Bohyun Kim is the Associate University Librarian for Library Information Technology at the University of Michigan Library.