Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology DBTA/Unisphere
PRIVACY/COOKIES POLICY
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



Magazines > Online > May/June 2005
Back Index Forward
 




SUBSCRIBE NOW!
Online Magazine

Vol. 29 No. 3 — May/June 2005

On The Net
Searching Books Between the Covers
By Greg R. Notess | Reference Librarian, Montana State University

To find information inside a printed book, people traditionally rely on an index or, for a few works, a concordance. With the advent of e-books, however, people could search the entire text, assuming they bought the e-book. Although a growing number of copyright-free books are now on the Web, those still under copyright remained unsearchable—until now.

First, in October 2003, Amazon introduced Search Within the Book with extracts from some books and the full text from others. Then, Google started its Google Print program with book extracts. Next, in April 2004, Amazon launched A9.com, which combined Search Within the Book with Web searching.

More recently, Google revamped its book portion of Google Print by expanding from extracts to full text from some publishers. In addition, book results, news, and product results now appear at the top rather than as part of the regular listings. Google also announced plans to digitize books at the libraries of Stanford, the University of Michigan, Harvard, the New York Public Library, and Oxford. These digitized versions will also be available via Google Print.

In all these endeavors, copyright issues assure that most users cannot view the entire text of any works still under copyright. Amazon and Google limit the number of pages you can view even if the entire text can be searched. While the number of books available from these programs is in the hundreds of thousands now, rather than the millions promised by Google when it finishes its library project, it still begs the question of when and how we should use these resources.

FULL-TEXT BOOK SEARCH OPTIONS

While numerous commercial databases offer full-text searching of various e-book collections, this column focuses on the new, free search choices offered by search engines. Amazon’s Search Inside the Book is available at the main U.S. Amazon.com site but not yet at its U.K., Canada, or other international sites. Amazon initially gave Search Inside the Book matches for any query, but now some matches may only be seen after choosing the “Click here to see additional results” message. In addition, the record for each book that is searchable has a “search inside” icon with an arrow above the book jacket image. To see the page image from the book, Amazon requires a user name and password.

A9.com, the Amazon-owned search engine, has a variety of personalization features and a collection of databases. The Books database is, of course, the Amazon book catalog including the Search Inside the Book titles. Again, to see the page image that is located at Amazon, you need an Amazon user name and password. The page images are still located on the Amazon servers.

Google Print is a combination of initiatives. It includes book extracts from publishers, full textbooks from publishers, and items from its library initiative. The difficulty now is that there is no direct access to the database. The Google Print site [http: //print.google.com] gives only a brief overview of the service with no search box. Originally, the results were included in regular Google search results, but that changed late last year. Now the Google Print matches show up at the top, above the regular results. They have a books icon and a header of “Book Results for . . .” followed by up to three matching hits. Unfortunately, no option is available to get to more than the first three listed results. It is important to note that the full text of items within Google Print are not all directly searchable from Google. First, find the Google Print record and then use the internal search for just that title to get a deeper search.

Google Print results only show up for certain searches. Like Amazon, it only works at the main U.S. Google site and not yet at the various international versions. In the past, you could add site:print.google.com to a search term to limit results to the Google Print records. As of winter 2005, that technique only retrieved the small, dated collection of full-text magazine articles from Reed Business Information. Currently, the most effective way to get Google Print results to display is to preface search terms with book on or book about or just book. Note that book must be the first word in the query. The search  book mark works while mark book does not. Occasionally, for very well-know titles, just the book title is needed with no prefix.

WHEN TO SEARCH BETWEEN THE COVERS

Several colleagues tell me they use neither of these tools. Other than initial experimentation, I rarely used them myself for the first few months. More recently, I have been trying to remember that the databases are there and find uses for them. My first success story was a question about a quotation source. Numerous Web sites included the quotation and the author’s name, but none gave even a partial citation to the specific work, much less the page numbers. After failing to find the answer via Web search engines and standard quotation sources, I tried the search at Amazon and struck gold. The phrase was found in the Search Inside the Book and let me view the exact page. I was able to provide the user with a full citation and page number without even leaving the reference desk. It was especially helpful since the book was not in the library’s collection.

Beyond quotation searching, book searches can be used to verify citations, especially for chapter titles, and to look at the actual copyright page of a book. Other applications include checking for plagiarism, hunting for intellectual property violations, and tracking mentions of trademarks and business names in both fiction and nonfiction books. For distance reference service, it lets both user and librarian look at the same page of a book while discussing it over the phone. For the reader who only remembers a character’s name but not the title or author, the book databases offer a new source in which to dig. Other uses likely abound, but we need to start considering what possibilities these databases offer, especially as they grow.

SEARCH STRATEGIES

In my, so far, limited experience searching free book content on the Web, two distinct approaches emerge. First, try a phrase search for an extract from a book. Typically, a four- or five-word phrase can narrow results to just a few hits. Remember that these records no longer appear in regular Google results, so preface the search with book to find Google Print results.

The second approach is to search for a book title as a phrase. Note that at Amazon (and thus A9’s books as well), phrase searching is not exact. Words within the phrase are stemmed—a search finds both singular and plural forms. In addition, stop words within a phrase are ignored.

Often, the two strategies can be combined. Use a title search to see if the book is in one of the databases. Then, use the in-text phrase search to find the appropriate passage in the book. I have used this search to find the title of a work and then searched that title as a phrase to check its availability at Amazon or Google. Frequently, I search both databases and the open Web.

ACCESS DISPARITY

One of the many problems in the current state of free book content searching is the wide disparity of access. It seems that most publishers that wish to provide free content online work with Amazon and Google. However, many only work with one or the other, while some publishers provide more free online content on their own sites than at either Amazon or Google.

Take the example of the ever-popular Dietary Reference Intakes: Applications in Dietary Planning published by the National Academies Press. Searching a phrase from page 20 of the book “nutrient intakes feeds” gets no hits at Google. Amazon and A9 find one hit, but it is for a different book.

Using the second strategy and searching the title itself as a phrase finds the book at A9 and Amazon, but there is no “Search Inside This Book” option. Google Print appears to pull up the work with the search book “Dietary Reference Intakes: Applications in Dietary Planning”. It lists the result as Dietary Reference Intakes, but it is actually a separate work in that series (subtitled Guiding Principles for Nutrition Labeling and Fortification, although Google does not give this bibliographic information except in the enlarged cover image). But the search is not yet over.

On the library database side, WorldCat lists three entries for this title, each with an “Internet Resource” tag and a corresponding URL. One is for a copy in netLibrary; another is from Ebrary; and the third, connected with the print record, is for a table of contents available at the Library of Congress. None of these sources find the free full text that is available directly from the National Academies Press Web site.

In this case, just searching the title as a phrase at Yahoo!, Google, Teoma, or MSN will bring up the free full text [www.nap.edu/books/0309088534/html] with each page available as a GIF or PDF.

Another example: A search for true power of grouping, which is from Managing and Using MySQL, second edition, strikes out at Google, A9, Amazon, Yahoo!, and MSN. Surprisingly, Ask Jeeves finds a hit for this phrase at a site that requires a user name and password, which in turn implies that the book is available from a commercial source as an e-book.

Knowing that far too many copyrighted books have been posted somewhere online, I searched for another phrase from earlier in the book. This time, a search engine found a PDF version of the entire book at an academic site in China (which also included dozens of other books).

Both Amazon and Google Print include Managing and Using MySQL, but Amazon includes only excerpts from the book, while Google only finds it if searched by title (preceded by book). In other words, a regular Google search for a phrase from a book may not find a Google Print record. Therefore, search by the title as well. Then try it again since Google Print results do not always display. On one search, I got no “Book results for . . .” Yet, just clicking the Search button one more time displayed some book results. Expect inconsistencies.

CURRENT COMPARISONS

The Amazon and Google databases may change dramatically in the next few months. Both are basically still experimental programs, and the companies need to carefully balance access with copyright limitations. The size and scope of the databases depend on which publishers grant permission. With the library agreements, Google’s database should be huge, but there are few library titles included at this point. They predict a 6-year timeline to finish.

Amazon generally found more titles than Google for several searches I tried. However, a significant problem with such a comparison is that the full text of many books available from Google is not directly searchable. Searching a phrase from a book may get no results on a Google search, but searching for the title and then using the search box within the Google Print display did find the page. An example is searching for the phrase “update shareholders and customers”, which occurs on page 40 of Information Technology Security. That search finds no results at Google; the same phrase finds the book and extract at both A9 and Amazon. Once you know the title of the book, Google finds it easy enough with the search book information technology security.

A search for the phrase “clung to bohemian ideals” at Google found zero results. At A9, in the books column, the search finds Artists, Advertising, and the Borders of Art by Michele H. Bogart, but the link only goes to the book at Amazon and not to the specific page. Searching the phrase directly at the Amazon book search results in the message:

Book search results: we found no results that closely match your search for: “clung to bohemian ideals”

Click here to see additional results that may be relevant to your search.

Only after following the link in that last line will Amazon give a Key Word in Context (KWIC) extract and the link to the exact page with the quote.

Score? Google gets zero for not finding the book unless you already know it is there. A9 starts better, but without the extract or a direct link to the page, A9 still requires a user to repeat the same search at Amazon once the book has been located. Starting the search at Amazon gives a KWIC extract after the second click, with the full-page image only a more click away.

However, for every example listed, I also found times where each of these behaved quite differently on another search. In one case, a regular Google search found a Google Print source where the search term was on page 773. At times, A9 does include a KWIC extract and a direct link to the page. Occasionally, Amazon also directly displays Search Within the Book KWIC extracts without the “Click here to see . . .” message.

Again, expect many changes ahead for both of these endeavors. Access and scope may both change. Certainly, we can hope that Google will provide more direct access to its database. In the meantime, consider what opportunities these databases offer for the searchers’ toolbox.

 


Greg R. Notess [greg@notess.com; http://www.notess.com/] is a reference librarian at Montana State University and founder of SearchEngineShowdown.com.

Comments? E-mail letters to the editor to marydee@infotoday.com.


       Back to top