Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology DBTA/Unisphere
PRIVACY/COOKIES POLICY
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



Magazines > Online > July/August 2004
Back Index Forward
 




SUBSCRIBE NOW!
Online Magazine
Vol. 28 No. 4 — July/August 2004
On The Net
The New Yahoo! Search
By Greg R. Notess
Reference Librarian Montana State University

The first half of 2004 has witnessed major changes in the search engine industry, especially in terms of which search engines still use their own databases and which use one from someone else. Yahoo!, AlltheWeb, AltaVista, and Lycos have all changed their underlying databases, and in this new environment, the remaining search engines that have their own, unique databases are Yahoo!, Google, Teoma, WiseNut, and some smaller ones such as Gigablast.

Perhaps the most significant change is the new Yahoo! Search database and the loss of the AlltheWeb and AltaVista unique databases. First, in February, Yahoo! stopped using the Google database and launched its own search engine database. In Yahoo!'s earlier days, it featured hits from its directory followed by results from a search engine partner (AltaVista, Inktomi, and then Google). Then in October 2002, it pushed its Google search results front and center and only left a few directory category links at the top.

After Yahoo!'s acquisition of Inktomi, AlltheWeb, and AltaVista, everyone expected major changes at all those sites. And sure enough, after the launching of the new Yahoo! search engine in March, both AlltheWeb and AltaVista ceased to have their own databases and instead switched to using a version of the new Yahoo! database. Shortly after that, Lycos also switched from using the AlltheWeb database to the Yahoo! database. With a new Yahoo! database and significant changes at other search engines, it is definitely worth taking a new and closer look at Yahoo!'s database and search features.

UNDERSTANDING THE DATABASES

What is this new Yahoo! database? Does it come from Inktomi, AltaVista, AlltheWeb, or some combination of all three? According to Tim Mayer at Yahoo!, the new Yahoo! Search is a new database, built with technology and engineers from all three. It seems to be primarily Inktomi-based, but it is different than the old Inktomi.

Beyond the Yahoo!-owned sites, various partners also use Yahoo!'s database. For example, both Lycos and HotBot (when using the "HotBot" database) use Yahoo!, although still under the Inktomi name. MSN Search continues to use Yahoo!'s database, even while working on building its own.

So, is the database at Yahoo! exactly the same as the one now at AlltheWeb, AltaVista, Lycos, and MSN Search? No. There are some differences. The AltaVista, AlltheWeb, and Lycos Web databases now seem identical, but the ranking can differ. HotBot is very similar, but I have found a few different records there. MSN Search is also quite similar, but it will often find a few records that the others do not. But the biggest difference is that Yahoo! itself finds more records than any of its partners.

For a typical example of one search done at all of these databases using a rather unique word, the exact same 13 results were shown by AltaVista and AlltheWeb and Lycos. HotBot also found 13, but one of the URLs was different. MSN Search found 15, including one that none of the other four included. Trying the exact same search directly at Yahoo! found more than all the others. Yahoo! found 43 records, including all of those found by all of its partners.

This type of results difference shows up on many searches and makes Yahoo! itself the largest of the various sources for Yahoo! (or Inktomi) databases. The other significance of these results is that Yahoo! now finds significantly more results than any of its partners. In addition, for that particular search, Yahoo! even found more than Google, which only found 33.

IMPORT FOR SEARCHERS

First of all, anyone whose Web site has a page listing and linking to various search engines needs to reconsider which ones to include. For Yahoo!, part of the question to consider is which version or versions to include. Yahoo! now has three separate URLs for its three faces:

• The main portal entryway to all of its services remains at www.yahoo.com

• The directory is available at dir.yahoo.com

• The search engine is at search.yahoo.com

Note that all three still have a Yahoo! search box. Except at the directory, all search the Yahoo! search engine database. At the directory, the search defaults to only searching the directory, but simply changing the radio button to "the Web" will go back the larger search engine database.

Is there any reason to continue linking to AlltheWeb, AltaVista, Lycos, or MSN Search? Yahoo! is planning to continue supporting the AlltheWeb and AltaVista sites, and says that the sites have different user demographics and can now rank results differently. And we may well see some further differentiation between the databases and search features, but at this point, there seems to be little incentive to use them or link to them except that both offer audio and video databases that are not yet available at Yahoo! or other sites.

As to partners like MSN and Lycos, again, until these sites differentiate themselves a bit more, there is little that is unique to offer the advanced searcher. MSN is worth watching, as it may begin experimenting with its own database, but that may not be for a year or more.

SEARCH FEATURES

With the loss of the unique databases at AltaVista and AlltheWeb, several advanced search features have disappeared as well. None of the search engines have truncation (wild cards) or proximity operators like NEAR that AltaVista used to have. The file size and IP address limits have disappeared from AlltheWeb along with the indexing of text content within Flash files.

But at the same time, Yahoo! now supports other advanced search features that were not available before when it was using the Google database. In addition to the Yahoo! database itself, full Boolean searching, more comprehensive link searching, embedded content limits, and more features make Yahoo! well worth a visit.

BOOLEAN SEARCHING

While the help files and advanced search page do not make it explicit, Yahoo! Search can now handle full-nested Boolean queries. Just be sure to use the operators in all uppercase letters: AND, OR, and NOT. Either AND NOT or NOT will work along with parentheses next to the queries. The Advanced Search page also has forms for "all of these words," "any of these words," and "none of these words," but it does not have a labeled Boolean Search box like AlltheWeb and AltaVista. However, since the Boolean searching works directly from the search box, it is not really necessary.

For the simplified forms of Boolean, Yahoo! defaults to an AND operation and accepts the use of the minus sign (-) for a NOT operation. However, trying to combine Boolean operators and the minus sign can confuse the search engine, so it is best either to create a full Boolean expression or to not use any operators other than the minus sign.

LINK SEARCHING

A common complaint about Google is its limited link searching ability. Even though it knows about many links between Web pages, doing a link search at Google does not retrieve all the pages that it knows are linking to the entered URL. Nor can you combine a link search with other words. This means that a searcher cannot exclude pages on the same site as the entered URL from being listed.

For a long time, AlltheWeb has been recommended as the best option for a more comprehensive link search. Now Yahoo! fills that role, but the syntax is a bit different. When doing a link search at Yahoo!, another option not yet available on its advanced search page, use link: followed by the full URL, including the http://. It does not work if you omit the http:// prefix. However, if you then wish to exclude results from the same site, using the site: command, you will need to drop the http:// prefix. For example, to find pages linked to www.college.edu but not on that site, use

link: http://www.college.edu NOT site:college.edu.

EMBEDDED CONTENT LIMIT

Another loss: AlltheWeb had the ability to search for Web pages that contained certain types of embedded content such as audio files, Flash, or videos. Since Inktomi has also had this feature in the past (and it still listed on the MSN advanced search page), it seems like it should also be available at Yahoo!.

While it is not yet listed in either the help pages or on the Yahoo! Advanced Search page, using some of Inktomi's command line syntax, it is available at Yahoo!. Just use feature: followed by the type of embedded content. Yahoo! recognizes the following:

acrobat

applet

activex

audio

embed

flash

form

frame

image

script

shockwave

table

video

vrml

For example, search feature: flash information literacy tutorials to find pages with embedded, or linked, Flash tutorials. Hopefully, Yahoo! will add checkbox options to its advanced search page to make this easier.

WILD CARD WORD IN A PHRASE

AltaVista used to offer a wild card word in a phrase technique, as does Google, where an asterisk can be used within a phrase search to match any word in that position. That is now gone at AltaVista, but Yahoo! actually does supports it. At first, you had to use a stop word instead of an asterisk, but now it works just like it did at AltaVista and still does at Google so that the asterisk can represent any one single word in that exact position.

For example, to find "addictive semiconscious vice of biblioscopy" when you are not sure of the third word, search "addictive semiconscious • of biblioscopy". Multiple wild card words can be used as in "addictive • • of biblioscopy".

DISPLAY

Yahoo! offers several useful features in its display of search results that other search engines would do well to copy. First of all, it numbers the search results, making it much easier to keep track of which ones you have already seen. Also, instead of the measly 10 results at a time that is the default at too many other search engines, Yahoo! displays 20 at a time by default.

Now that Google has banished its directory from its front page and search results, at least Yahoo! continues to have links at the top to a few matching directory categories. In addition, search engine results that are also in the directory have an extra line with their directory category.

On the down side, Yahoo! displays up to four ads above the search results, labeled as "sponsor results," which often can push the regular results down below the fold. At least there are no more than four ads in the right margin.

DIACRITICS

Another difference in processing at Yahoo! is how it handles diacritics. Diacritic processing makes a difference on searches such as cañón in Spanish which has a different meaning from cañón in Spanish and is also an English word (without the diacritics). In the tradition of AltaVista, Yahoo! Search will match query words without diacritics to pages that have the word without diacritics as well as to those that have it with diacritics. But a query word with diacritics should only find exact matches. In other words, canon finds pages with canon as well as those with cañón. But a search on cañón only finds pages that have the word with those exact diacritics.

In looking for a search example to use for comparing the search engines, I was looking at some current searches at Metaspy [www.metafind.com/
metaspy] and saw one for corbeil electroménagers.

Trying that search with the diacritc (e acute, or é) at both Yahoo! and Google found some exact matches, but not the actual store's Web site. Trying again without the regular e rather than the é found a different group of records at Google. But at Yahoo!, it found many more, and finally, the link to the Montreal store by that name. Sometimes searching for an exact match with diacritics works best, and at other times better results can be obtained by searching without.

THE YAHOO! FUTURE

Since Yahoo!'s foray into running its own search engine database is new, it is likely to continue working on search. We may see a wide variety of changes, not only at Yahoo!, but at AltaVista, AlltheWeb, and MSN as well. Hopefully, all of the search features mentioned above will continue to function, and ideally, more will be added (or at least the advanced search page and the documentation will mention them).

With all the changes at various search engines, Yahoo! has now risen to the top as one of the major places to look beyond Google. Both now find Web pages that the other does not, and each has some unique search features. Additionally, both offer cached copies of Web pages, making Yahoo! the second major source for older pages. While the search engine space changes constantly, it looks likely that Yahoo! will be a significant player for some time to come.

 


Greg NotessGreg R. Notess (greg@notess.com; www.notess.com) is a reference librarian at Montana State University and founder of SearchEngineShowdown.com

Comments? Email the editor at marydee@infotoday.com

 


       Back to top