On The Net 
                        The New Yahoo! Search                          
                        By Greg R. Notess 
                        Reference Librarian Montana
                        State University  
 The first half of 2004 has witnessed major changes in the search
  engine industry, especially in terms of which search engines still use their
  own databases and which use one from someone else. Yahoo!, AlltheWeb, AltaVista,
  and Lycos have all changed their underlying databases, and in this new environment,
  the remaining search engines that have their own, unique databases are Yahoo!,
  Google, Teoma, WiseNut, and some smaller ones such as Gigablast.
 Perhaps the most significant change is the new Yahoo! Search database and
  the loss of the AlltheWeb and AltaVista unique databases. First, in February,
  Yahoo! stopped using the Google database and launched its own search engine
  database. In Yahoo!'s earlier days, it featured hits from its directory followed
  by results from a search engine partner (AltaVista, Inktomi, and then Google).
  Then in October 2002, it pushed its Google search results front and center
  and only left a few directory category links at the top.
  After Yahoo!'s acquisition of Inktomi, AlltheWeb, and AltaVista, everyone
  expected major changes at all those sites. And sure enough, after the launching
  of the new Yahoo! search engine in March, both AlltheWeb and AltaVista ceased
  to have their own databases and instead switched to using a version of the
  new Yahoo! database. Shortly after that, Lycos also switched from using the
  AlltheWeb database to the Yahoo! database. With a new Yahoo! database and significant
  changes at other search engines, it is definitely worth taking a new and closer
  look at Yahoo!'s database and search features.
  UNDERSTANDING THE DATABASES
  What is this new Yahoo! database? Does it come from Inktomi, AltaVista, AlltheWeb,
  or some combination of all three? According to Tim Mayer at Yahoo!, the new
  Yahoo! Search is a new database, built with technology and engineers from all
  three. It seems to be primarily Inktomi-based, but it is different than the
  old Inktomi.
  Beyond the Yahoo!-owned sites, various partners also use Yahoo!'s database.
  For example, both Lycos and HotBot (when using the "HotBot" database) use Yahoo!,
  although still under the Inktomi name. MSN Search continues to use Yahoo!'s
  database, even while working on building its own.
  So, is the database at Yahoo! exactly the same as the one now at AlltheWeb,
  AltaVista, Lycos, and MSN Search? No. There are some differences. The AltaVista,
  AlltheWeb, and Lycos Web databases now seem identical, but the ranking can
  differ. HotBot is very similar, but I have found a few different records there.
  MSN Search is also quite similar, but it will often find a few records that
  the others do not. But the biggest difference is that Yahoo! itself finds more
  records than any of its partners.
  For a typical example of one search done at all of these databases using
  a rather unique word, the exact same 13 results were shown by AltaVista and
  AlltheWeb and Lycos. HotBot also found 13, but one of the URLs was different.
  MSN Search found 15, including one that none of the other four included. Trying
  the exact same search directly at Yahoo! found more than all the others. Yahoo!
  found 43 records, including all of those found by all of its partners.
  This type of results difference shows up on many searches and makes Yahoo!
  itself the largest of the various sources for Yahoo! (or Inktomi) databases.
  The other significance of these results is that Yahoo! now finds significantly
  more results than any of its partners. In addition, for that particular search,
  Yahoo! even found more than Google, which only found 33.
  IMPORT FOR SEARCHERS
  First of all, anyone whose Web site has a page listing and linking to various
  search engines needs to reconsider which ones to include. For Yahoo!, part
  of the question to consider is which version or versions to include. Yahoo!
  now has three separate URLs for its three faces:
 
   The main portal entryway to all of its services remains at www.yahoo.com  
   The directory is available at dir.yahoo.com  
   The search engine is at search.yahoo.com  
  Note that all three still have a Yahoo! search box. Except at the directory,
  all search the Yahoo! search engine database. At the directory, the search
  defaults to only searching the directory, but simply changing the radio button
  to "the Web" will go back the larger search engine database.
  Is there any reason to continue linking to AlltheWeb, AltaVista, Lycos, or
  MSN Search? Yahoo! is planning to continue supporting the AlltheWeb and AltaVista
  sites, and says that the sites have different user demographics and can now
  rank results differently. And we may well see some further differentiation
  between the databases and search features, but at this point, there seems to
  be little incentive to use them or link to them except that both offer audio
  and video databases that are not yet available at Yahoo! or other sites.
  As to partners like MSN and Lycos, again, until these sites differentiate
  themselves a bit more, there is little that is unique to offer the advanced
  searcher. MSN is worth watching, as it may begin experimenting with its own
  database, but that may not be for a year or more.
  SEARCH FEATURES
  With the loss of the unique databases at AltaVista and AlltheWeb, several
  advanced search features have disappeared as well. None of the search engines
  have truncation (wild cards) or proximity operators like NEAR that AltaVista
  used to have. The file size and IP address limits have disappeared from AlltheWeb
  along with the indexing of text content within Flash files.
  But at the same time, Yahoo! now supports other advanced search features
  that were not available before when it was using the Google database. In addition
  to the Yahoo! database itself, full Boolean searching, more comprehensive link
  searching, embedded content limits, and more features make Yahoo! well worth
  a visit.
  BOOLEAN SEARCHING
  While the help files and advanced search page do not make it explicit, Yahoo!
  Search can now handle full-nested Boolean queries. Just be sure to use the
  operators in all uppercase letters: AND, OR, and NOT. Either AND NOT or NOT
  will work along with parentheses next to the queries. The Advanced Search page
  also has forms for "all of these words," "any of these words," and "none of
  these words," but it does not have a labeled Boolean Search box like AlltheWeb
  and AltaVista. However, since the Boolean searching works directly from the
  search box, it is not really necessary.
  For the simplified forms of Boolean, Yahoo! defaults to an AND operation
  and accepts the use of the minus sign (-) for a NOT operation. However, trying
  to combine Boolean operators and the minus sign can confuse the search engine,
  so it is best either to create a full Boolean expression or to not use any
  operators other than the minus sign.
  LINK SEARCHING
  A common complaint about Google is its limited link searching ability. Even
  though it knows about many links between Web pages, doing a link search at
  Google does not retrieve all the pages that it knows are linking to the entered
  URL. Nor can you combine a link search with other words. This means that a
  searcher cannot exclude pages on the same site as the entered URL from being
  listed.
  For a long time, AlltheWeb has been recommended as the best option for a
  more comprehensive link search. Now Yahoo! fills that role, but the syntax
  is a bit different. When doing a link search at Yahoo!, another option not
  yet available on its advanced search page, use link: followed by the full URL,
  including the http://. It does not work if you omit the http:// prefix. However,
  if you then wish to exclude results from the same site, using the site: command,
  you will need to drop the http:// prefix. For example, to find pages linked
  to www.college.edu but not on that site, use
  link: http://www.college.edu NOT site:college.edu.
  EMBEDDED CONTENT LIMIT
  Another loss: AlltheWeb had the ability to search for Web pages that contained
  certain types of embedded content such as audio files, Flash, or videos. Since
  Inktomi has also had this feature in the past (and it still listed on the MSN
  advanced search page), it seems like it should also be available at Yahoo!.
  While it is not yet listed in either the help pages or on the Yahoo! Advanced
  Search page, using some of Inktomi's command line syntax, it is available at
  Yahoo!. Just use feature: followed by the type of embedded content. Yahoo!
  recognizes the following:
 
   acrobat
   
   applet
   
   activex
   
   audio
   
   embed
   
   flash
   
   form
   
   frame
   
   image
   
   script
   
   shockwave
   
   table
   
   video
   
   vrml
   
  For example, search feature: flash information literacy tutorials to find
  pages with embedded, or linked, Flash tutorials. Hopefully, Yahoo! will add
  checkbox options to its advanced search page to make this easier.
  WILD CARD WORD IN A PHRASE
  AltaVista used to offer a wild card word in a phrase technique, as does Google,
  where an asterisk can be used within a phrase search to match any word in that
  position. That is now gone at AltaVista, but Yahoo! actually does supports
  it. At first, you had to use a stop word instead of an asterisk, but now it
  works just like it did at AltaVista and still does at Google so that the asterisk
  can represent any one single word in that exact position.
  For example, to find "addictive semiconscious vice of biblioscopy" when you
  are not sure of the third word, search "addictive semiconscious  of biblioscopy".
  Multiple wild card words can be used as in "addictive   of biblioscopy".
  DISPLAY
  Yahoo! offers several useful features in its display of search results that
  other search engines would do well to copy. First of all, it numbers the search
  results, making it much easier to keep track of which ones you have already
  seen. Also, instead of the measly 10 results at a time that is the default
  at too many other search engines, Yahoo! displays 20 at a time by default.
  Now that Google has banished its directory from its front page and search
  results, at least Yahoo! continues to have links at the top to a few matching
  directory categories. In addition, search engine results that are also in the
  directory have an extra line with their directory category.
  On the down side, Yahoo! displays up to four ads above the search results,
  labeled as "sponsor results," which often can push the regular results down
  below the fold. At least there are no more than four ads in the right margin.
  DIACRITICS
  Another difference in processing at Yahoo! is how it handles diacritics.
  Diacritic processing makes a difference on searches such as cañón
  in Spanish which has a different meaning from cañón in Spanish
  and is also an English word (without the diacritics). In the tradition of AltaVista,
  Yahoo! Search will match query words without diacritics to pages that have
  the word without diacritics as well as to those that have it with diacritics.
  But a query word with diacritics should only find exact matches. In other words,
  canon finds pages with canon as well as those with cañón. But
  a search on cañón only finds pages that have the word with those
  exact diacritics.
  In looking for a search example to use for comparing the search engines,
  I was looking at some current searches at Metaspy [www.metafind.com/ 
  metaspy] and saw one for corbeil electroménagers.
  Trying that search with the diacritc (e acute, or é) at both Yahoo!
  and Google found some exact matches, but not the actual store's Web site. Trying
  again without the regular e rather than the é found a different group
  of records at Google. But at Yahoo!, it found many more, and finally, the link
  to the Montreal store by that name. Sometimes searching for an exact match
  with diacritics works best, and at other times better results can be obtained
  by searching without.
  THE YAHOO! FUTURE
  Since Yahoo!'s foray into running its own search engine database is new,
  it is likely to continue working on search. We may see a wide variety of changes,
  not only at Yahoo!, but at AltaVista, AlltheWeb, and MSN as well. Hopefully,
  all of the search features mentioned above will continue to function, and ideally,
  more will be added (or at least the advanced search page and the documentation
  will mention them).
  With all the changes at various search engines, Yahoo! has now risen to the
  top as one of the major places to look beyond Google. Both now find Web pages
  that the other does not, and each has some unique search features. Additionally,
  both offer cached copies of Web pages, making Yahoo! the second major source
  for older pages. While the search engine space changes constantly, it looks
  likely that Yahoo! will be a significant player for some time to come.
    
 Greg
R. Notess (greg@notess.com; www.notess.com)
is a reference librarian at Montana State University and founder of SearchEngineShowdown.com.  
Comments? Email the editor at marydee@infotoday.com.  
   
  |