On The Net
The New Yahoo! Search
By Greg R. Notess
Reference Librarian Montana
State University
The first half of 2004 has witnessed major changes in the search
engine industry, especially in terms of which search engines still use their
own databases and which use one from someone else. Yahoo!, AlltheWeb, AltaVista,
and Lycos have all changed their underlying databases, and in this new environment,
the remaining search engines that have their own, unique databases are Yahoo!,
Google, Teoma, WiseNut, and some smaller ones such as Gigablast.
Perhaps the most significant change is the new Yahoo! Search database and
the loss of the AlltheWeb and AltaVista unique databases. First, in February,
Yahoo! stopped using the Google database and launched its own search engine
database. In Yahoo!'s earlier days, it featured hits from its directory followed
by results from a search engine partner (AltaVista, Inktomi, and then Google).
Then in October 2002, it pushed its Google search results front and center
and only left a few directory category links at the top.
After Yahoo!'s acquisition of Inktomi, AlltheWeb, and AltaVista, everyone
expected major changes at all those sites. And sure enough, after the launching
of the new Yahoo! search engine in March, both AlltheWeb and AltaVista ceased
to have their own databases and instead switched to using a version of the
new Yahoo! database. Shortly after that, Lycos also switched from using the
AlltheWeb database to the Yahoo! database. With a new Yahoo! database and significant
changes at other search engines, it is definitely worth taking a new and closer
look at Yahoo!'s database and search features.
UNDERSTANDING THE DATABASES
What is this new Yahoo! database? Does it come from Inktomi, AltaVista, AlltheWeb,
or some combination of all three? According to Tim Mayer at Yahoo!, the new
Yahoo! Search is a new database, built with technology and engineers from all
three. It seems to be primarily Inktomi-based, but it is different than the
old Inktomi.
Beyond the Yahoo!-owned sites, various partners also use Yahoo!'s database.
For example, both Lycos and HotBot (when using the "HotBot" database) use Yahoo!,
although still under the Inktomi name. MSN Search continues to use Yahoo!'s
database, even while working on building its own.
So, is the database at Yahoo! exactly the same as the one now at AlltheWeb,
AltaVista, Lycos, and MSN Search? No. There are some differences. The AltaVista,
AlltheWeb, and Lycos Web databases now seem identical, but the ranking can
differ. HotBot is very similar, but I have found a few different records there.
MSN Search is also quite similar, but it will often find a few records that
the others do not. But the biggest difference is that Yahoo! itself finds more
records than any of its partners.
For a typical example of one search done at all of these databases using
a rather unique word, the exact same 13 results were shown by AltaVista and
AlltheWeb and Lycos. HotBot also found 13, but one of the URLs was different.
MSN Search found 15, including one that none of the other four included. Trying
the exact same search directly at Yahoo! found more than all the others. Yahoo!
found 43 records, including all of those found by all of its partners.
This type of results difference shows up on many searches and makes Yahoo!
itself the largest of the various sources for Yahoo! (or Inktomi) databases.
The other significance of these results is that Yahoo! now finds significantly
more results than any of its partners. In addition, for that particular search,
Yahoo! even found more than Google, which only found 33.
IMPORT FOR SEARCHERS
First of all, anyone whose Web site has a page listing and linking to various
search engines needs to reconsider which ones to include. For Yahoo!, part
of the question to consider is which version or versions to include. Yahoo!
now has three separate URLs for its three faces:
The main portal entryway to all of its services remains at www.yahoo.com
The directory is available at dir.yahoo.com
The search engine is at search.yahoo.com
Note that all three still have a Yahoo! search box. Except at the directory,
all search the Yahoo! search engine database. At the directory, the search
defaults to only searching the directory, but simply changing the radio button
to "the Web" will go back the larger search engine database.
Is there any reason to continue linking to AlltheWeb, AltaVista, Lycos, or
MSN Search? Yahoo! is planning to continue supporting the AlltheWeb and AltaVista
sites, and says that the sites have different user demographics and can now
rank results differently. And we may well see some further differentiation
between the databases and search features, but at this point, there seems to
be little incentive to use them or link to them except that both offer audio
and video databases that are not yet available at Yahoo! or other sites.
As to partners like MSN and Lycos, again, until these sites differentiate
themselves a bit more, there is little that is unique to offer the advanced
searcher. MSN is worth watching, as it may begin experimenting with its own
database, but that may not be for a year or more.
SEARCH FEATURES
With the loss of the unique databases at AltaVista and AlltheWeb, several
advanced search features have disappeared as well. None of the search engines
have truncation (wild cards) or proximity operators like NEAR that AltaVista
used to have. The file size and IP address limits have disappeared from AlltheWeb
along with the indexing of text content within Flash files.
But at the same time, Yahoo! now supports other advanced search features
that were not available before when it was using the Google database. In addition
to the Yahoo! database itself, full Boolean searching, more comprehensive link
searching, embedded content limits, and more features make Yahoo! well worth
a visit.
BOOLEAN SEARCHING
While the help files and advanced search page do not make it explicit, Yahoo!
Search can now handle full-nested Boolean queries. Just be sure to use the
operators in all uppercase letters: AND, OR, and NOT. Either AND NOT or NOT
will work along with parentheses next to the queries. The Advanced Search page
also has forms for "all of these words," "any of these words," and "none of
these words," but it does not have a labeled Boolean Search box like AlltheWeb
and AltaVista. However, since the Boolean searching works directly from the
search box, it is not really necessary.
For the simplified forms of Boolean, Yahoo! defaults to an AND operation
and accepts the use of the minus sign (-) for a NOT operation. However, trying
to combine Boolean operators and the minus sign can confuse the search engine,
so it is best either to create a full Boolean expression or to not use any
operators other than the minus sign.
LINK SEARCHING
A common complaint about Google is its limited link searching ability. Even
though it knows about many links between Web pages, doing a link search at
Google does not retrieve all the pages that it knows are linking to the entered
URL. Nor can you combine a link search with other words. This means that a
searcher cannot exclude pages on the same site as the entered URL from being
listed.
For a long time, AlltheWeb has been recommended as the best option for a
more comprehensive link search. Now Yahoo! fills that role, but the syntax
is a bit different. When doing a link search at Yahoo!, another option not
yet available on its advanced search page, use link: followed by the full URL,
including the http://. It does not work if you omit the http:// prefix. However,
if you then wish to exclude results from the same site, using the site: command,
you will need to drop the http:// prefix. For example, to find pages linked
to www.college.edu but not on that site, use
link: http://www.college.edu NOT site:college.edu.
EMBEDDED CONTENT LIMIT
Another loss: AlltheWeb had the ability to search for Web pages that contained
certain types of embedded content such as audio files, Flash, or videos. Since
Inktomi has also had this feature in the past (and it still listed on the MSN
advanced search page), it seems like it should also be available at Yahoo!.
While it is not yet listed in either the help pages or on the Yahoo! Advanced
Search page, using some of Inktomi's command line syntax, it is available at
Yahoo!. Just use feature: followed by the type of embedded content. Yahoo!
recognizes the following:
acrobat
applet
activex
audio
embed
flash
form
frame
image
script
shockwave
table
video
vrml
For example, search feature: flash information literacy tutorials to find
pages with embedded, or linked, Flash tutorials. Hopefully, Yahoo! will add
checkbox options to its advanced search page to make this easier.
WILD CARD WORD IN A PHRASE
AltaVista used to offer a wild card word in a phrase technique, as does Google,
where an asterisk can be used within a phrase search to match any word in that
position. That is now gone at AltaVista, but Yahoo! actually does supports
it. At first, you had to use a stop word instead of an asterisk, but now it
works just like it did at AltaVista and still does at Google so that the asterisk
can represent any one single word in that exact position.
For example, to find "addictive semiconscious vice of biblioscopy" when you
are not sure of the third word, search "addictive semiconscious of biblioscopy".
Multiple wild card words can be used as in "addictive of biblioscopy".
DISPLAY
Yahoo! offers several useful features in its display of search results that
other search engines would do well to copy. First of all, it numbers the search
results, making it much easier to keep track of which ones you have already
seen. Also, instead of the measly 10 results at a time that is the default
at too many other search engines, Yahoo! displays 20 at a time by default.
Now that Google has banished its directory from its front page and search
results, at least Yahoo! continues to have links at the top to a few matching
directory categories. In addition, search engine results that are also in the
directory have an extra line with their directory category.
On the down side, Yahoo! displays up to four ads above the search results,
labeled as "sponsor results," which often can push the regular results down
below the fold. At least there are no more than four ads in the right margin.
DIACRITICS
Another difference in processing at Yahoo! is how it handles diacritics.
Diacritic processing makes a difference on searches such as cañón
in Spanish which has a different meaning from cañón in Spanish
and is also an English word (without the diacritics). In the tradition of AltaVista,
Yahoo! Search will match query words without diacritics to pages that have
the word without diacritics as well as to those that have it with diacritics.
But a query word with diacritics should only find exact matches. In other words,
canon finds pages with canon as well as those with cañón. But
a search on cañón only finds pages that have the word with those
exact diacritics.
In looking for a search example to use for comparing the search engines,
I was looking at some current searches at Metaspy [www.metafind.com/
metaspy] and saw one for corbeil electroménagers.
Trying that search with the diacritc (e acute, or é) at both Yahoo!
and Google found some exact matches, but not the actual store's Web site. Trying
again without the regular e rather than the é found a different group
of records at Google. But at Yahoo!, it found many more, and finally, the link
to the Montreal store by that name. Sometimes searching for an exact match
with diacritics works best, and at other times better results can be obtained
by searching without.
THE YAHOO! FUTURE
Since Yahoo!'s foray into running its own search engine database is new,
it is likely to continue working on search. We may see a wide variety of changes,
not only at Yahoo!, but at AltaVista, AlltheWeb, and MSN as well. Hopefully,
all of the search features mentioned above will continue to function, and ideally,
more will be added (or at least the advanced search page and the documentation
will mention them).
With all the changes at various search engines, Yahoo! has now risen to the
top as one of the major places to look beyond Google. Both now find Web pages
that the other does not, and each has some unique search features. Additionally,
both offer cached copies of Web pages, making Yahoo! the second major source
for older pages. While the search engine space changes constantly, it looks
likely that Yahoo! will be a significant player for some time to come.
Greg
R. Notess (greg@notess.com; www.notess.com)
is a reference librarian at Montana State University and founder of SearchEngineShowdown.com.
Comments? Email the editor at marydee@infotoday.com.
|