[ONLINE]
on the net
photo Greg Notess
Reference Librarian
Montana State University












Most search engines

provide the number of

results for every search.

However, the numbers are

not always accurate.






















































AltaVista also has long

had difficulty counting and

displaying a consistent

count...











































...Watch out for inaccurate

counting on niche Internet

databases, multiple search

engines, and other specialized

search tools.











































Problems with processing

the search syntax can result

in strange results as well.







Search Engine Inconsistencies

ONLINE, March 2000
Copyright © 2000 Information Today, Inc.

Subscribe

According to Aldous Huxley, "Consistency is contrary to nature, contrary to life." Far too often, it is also contrary to practices of the Internet search engines. The databases, search interfaces, indexing, storing, and processing are all computer-based functions. In general, a computer's strength is consistent processing, and in most cases will do the exact same task when given the exact same input.

Consistency in search processing certainly makes it easier on the information professional. When an information system responds consistently, it is easier to know when a search has been comprehensive and when to move onto another system.

Unfortunately for the consistency connoisseur, many of the less well-known Internet search tools are hastily constructed. Meanwhile, the top Internet search engines have had extensive development of their interface, but with the general aim of providing a few relevant answers quickly to almost any kind of search. In neither case is search consistency necessarily a high priority.

COUNTING CONSISTENCY

At their most basic level, computers excel at counting. Yet some Internet search engines do not. Several search engines have for a time stopped counting results at all. An advanced search on Lycos gives no total number, even though a regular Lycos search does. For a brief time, Excite stopped reporting the total number of hits, but then it put the numbers back in, although at a much smaller font size.

Most search engines provide the number of results for every search. However, the numbers are not always accurate. First of all, they have to figure out what to count. A portal that gives results from several databases could count the hits from each one or the total hits from all. Excite does not count the results from its directory categories or its general information, but it does provide separate counts for its Web results (its search engine database) and its news database.

Another complicating factor appears when a search engine clusters by site. When several pages matching the search criteria are all grouped under one record for the site, should the search engine count it as one hit or several? The simple approach would be to count sites as a separate number from pages, but the search engines that do cluster take several different approaches.

Infoseek just counts the total number of pages and reports that number, not the number of sites. Northern Light also counts the total number of hits and sometimes will report it as "115 items in 48 sources," where sources can be either a Web site or a Special Collection publication.

HotBot Loses Count

HotBot has much greater problems with counting its clustered results. Since it started clustering results, the reported number of matches appears to be the approximate number of sites, not pages. And if the number of matches is more than a few hundred, the reported number of matches is always a multiple of ten. However, the real problem with HotBot is that the reported number of matches seems to have no connection with what you can actually retrieve. And the number can change as you move to the next page. For example, a HotBot search on hypercarbia with display set to 100 hits reportedly found 200 matches (a multiple of ten). After clicking on the next page, HotBot then reported and displayed only 120 records. Were the other 80 hiding under the site clustering? To turn off site clustering, I tried the same search limited to .com, .edu, and .org sites. With that limit, HotBot then reported 150 matches. Clicking on the next page changed the reported number to 120 matches and displayed numbers 101-123.

Inconsistent HotBot Results
Depending on Kind of Search
HotBot Option fallugia garrya kanoodle
All the words 22 77 35
Any of the words 23 77 35
Exact phrase 23 78 27
Boolean phrase 22 78 27

AltaVista Altercations

AltaVista also has long had difficulty counting and displaying a consistent count, and for that reason will sometimes say that it has "about" X number of hits. This inconsistency (and others) seems to derive from AltaVista's programmed preference for speed over comprehensiveness. Rather than waiting to finish processing the search, AltaVista will deliver partial results, especially on large or complex searches.

Theoretically, reloading the search or going to the next results page will give AltaVista more time to process the search and possibly retrieve more results. In reality, it may find more or less. This is also why the numbers can differ when two people do exactly the same search. The search will time-out and deliver partial results before it is finished.

AltaVista also clusters results by site, at least on its Simple Search. For example, a search on the phrase dermacentor andersoni finds a reported 144 pages. However, many of the hits have additional matching pages from the same site clustered under a "More pages from this site" link. So, there are actually more than 144 hits. Run the same search in the advanced search and AltaVista now claims to find 237 pages. Yet, despite repeated attempts at reloading, it would only display 230.

While multiple databases, site clustering, and time-outs are some reasons that search engines may have difficulty counting, this certainly does not explain all counting inconsistencies--sometimes it is just strange programming. For example, Ancestry.com, which provides access to hundreds of genealogical databases including the Social Security Death Index, cannot even count to zero. Try a search on any non-existent name to see the statement that you are "Viewing records 1-0 of 0." How about a simple "no records found" message?

While HotBot, AltaVista, and Ancestry have difficulties in reporting an accurate number of results, many other search engines do manage to count quite accurately. But watch out for inaccurate counting on niche Internet databases, multiple search engines, and other specialized search tools.

PROCESSING PROBLEMS

Inconsistencies go beyond the inability to count. Problems with processing the search syntax can result in strange results as well. Truncation, field searching, and even basic Boolean processing may not work as expected. As with the counting, some search engines process the searches consistently, but others do not. Sometimes inconsistencies pop up unexpectedly. Take Google! for example. It has few advanced search features beyond phrase searching: the link: field search, and the minus (-) for NOT operations. In general, these features work as expected. However, try combining a link: field search and the minus for NOT. Suddenly the minus no longer excludes anything. A search of link:av.com -search finds exactly the same number of hits as link:av.com, even when plenty of those records do contain the word "search."

HotBot Single-Word Search

A single-term search is a fairly basic search. HotBot offers drop-down menu options for searching in several different ways, including all the words, any of the words, exact phrase, and Boolean phrase. On a single-word search, logic says that each of those options should find the same results. Unfortunately, they do not. The table shows the results for searching the same single term with each of the four options.

While HotBot cannot count higher numbers, in each of these instances, the number reported exactly matched the number of records displayed. I double-checked all of these numbers, even using a different Web browser to be sure the results were not coming from the cache. Other Inktomi search engines that offer all four options could consistently find the same number, even at Canada.com, which also clusters results by site like HotBot does.

Note that there is not even any consistency as to which of the four search options displays more records. In these examples, each search had two options that found the most records, but they were not the same two for every search. On other searches where I have tried this, the difference in numbers has been even greater.

AltaVista's Turn

AltaVista has shown similar examples of processing problems. Many of these can, perhaps, be explained by AltaVista's time-outs. The incomplete processing will certainly cause inconsistent results on a number of searches. Yet there are processing inconsistencies beyond that as well.

One example is AltaVista's ability to search diacritics. Input a search term without diacritics and AltaVista is supposed to search for matches with or without diacritics. Use the diacritics in the search term and only exact matches should result. So, a search on the French term ?l?phant should find less than just searching elephant. In general this works, but the search +?l?phant-elephant should find zero hits, since all of the records found with the required first term would also be excluded by the second. However, AltaVista usually retrieves a few hundred hits anyway.

A similar example involved AltaVista's case sensitivity. Searches using all lowercase letters are supposed to match any case or mixture of cases. Including a single uppercase letter in a search term is supposed to require an exact match of case. Thus, qwerty would match qwerty, QWERTY, and qWeRtY, while qWeRtY would only match qWeRtY. Yet for awhile, a search on Fe (as in Santa Fe) found more hits than fe, which should have found more.

Yet not even AltaVista's inconsistencies are consistent. The Fe problem only lasted for a few days, back in May 1999. Then it suddenly started working as it should, with a search on fe finding more than a search on Fe.

Multiple search engines run into similar difficulties. Because multiple search engines pull their results from other databases, they must constantly keep up with any changes to their component search engines. The multiple search engine must parse the results list, stripping out links to the rest of the search engine site, advertisements, and any other extraneous material.

Thus, the algorithm that worked yesterday may fail today. Sometimes ads or internal links get included by the multiple search engines as regular results. At other times, specific search engine results are missed altogether. Watch especially for claims of hits from Northern Lights and HotBot, and double-check directly on those search engines to be sure their hits are included.

WHAT TO DO

Some inconsistencies are due to a temporary situation, while others continue to recur. It makes for a challenging search environment. The inconsistencies are certainly frustrating, but they do not mean that you should stop using a specific search engine just because of the problems. Some of the most inconsistent engines still provide content, search features, and database records not available in any other sources.

Just be prepared for inconsistent behavior, know some of the ways in which the search engines do fail to deliver as promised, and plan search strategies accordingly. To help in keeping track of which search engines have particular problems, check on the inconsistencies section of my Search Engine Showdown site at http://notess.com/search/. It documents current and past inconsistencies and provides an opportunity for others to report inconsistencies. Sharing and tracking details of these problems can help us better understand the results that the search engines deliver. As to the inconsistent search engines, Walt Whitman said it best.

Do I contradict myself? Very well then I contradict myself, (I am large, I contain multitudes).


Greg R. Notess (greg@notess.com; http://www.notess.com/) is a Reference Librarian at Montana State University.

Comments? Email letters to the Editor at editor@infotoday.com.

[infotoday.com] [ONLINE] [Current Issue] [Subscriptions] [Top]

Copyright © 2000, Information Today, Inc. All rights reserved.
Comments