DEPARTMENTS
Internet Search Engine Update
by Greg R. Notess
Reference Librarian, Montana State University |
Internet
Search Engine Update goes up on the Web at http://www.onlinemag.net as soon as it is written,
approximately one month before the print issue mails
to subscribers.
AlltheWeb now
uses a keyword-in-context (KWIC) display in their search
results. It also has a new field search command of site:.
This is an easier-to-remember version of the older url.host:
and url.domain: field searches and can be used for top-level
domains and regular domains. For example, either site:edu
or site:company.com can be used and can be combined with
other search terms. AlltheWeb has also announced that
its site is now fully XHTML and CSS compliant.
AltaVista has made some major updates. It now
includes indexed PDF files, joining Google and FAST
as the third search engine to offer access to these
information-rich files. Searchers can use the filetype:pdf
syntax or the advanced search page to limit to PDFs.
The main page and logo have been redesigned. It has
fewer ads, having removed pop-ups and pop-under ads
in August, as well as the graphic banner ad from their
home page. It plans on increasing the freshness of
the database by refreshing about half of the results
that users retrieve on a roughly daily basis. It has
increased the size of the database slightly to a bit
under 1 billion Web pages and 250 million images. Its
international focus has expanded with the introduction
of Prisma suggestion technology into French, German,
Italian, and Spanish and the expansion of the News
search to German.
BoardReader is a search engine that searches
Web-based discussion forums, which are often not indexed
by other search engines. BoardReader accepts phrase
searching and truncation with an asterisk. Results
include a cached copy, the date, and the number of
replies.
GigaBlast, a new search engine launched last
summer, is now offering a site search product and has
launched a Swedish/Scandinavian version at www.gigablast.nu.
While the Swedish version uses the same database, it
adds a Swedish pages limit. There is also more attention
to the design of the site, but the advanced search
does not have as many options.
Google now claims to provide access to over
3 billion Web documents. Researchers [http://cyber.law.harvard.edu/filtering/google/] have
also discovered that the www.google.fr and www.google.de international
versions have excluded certain Web sites to avoid legal
problems with laws in those countries. Two new country
domains have been added—Poland and Thailand.
Inktomi has sold its enterprise search software
(formerly known as Ultraseek) to Verity, leaving Inktomi
to focus almost exclusively on Web searching. It has
also launched a new database that it claims includes
3 billion records, added spell checking, changed to
a keyword-in-context (KWIC) display for some records,
has greatly increased the freshness of the database,
and aggressively removed dead links. It has introduced
a relevance technology to help provide better results
for ambiguous terms such as york or mexico so that
top-ranked results will not be for new york or new
mexico.
MyWay is a new portal from the Excite Networks. MyWay.com boasts
that it has no banners or pop-ups. The portal content
is similar to that at Excite and iWon, and the search
engine and directory come from Google and Google's
version of the Open Directory. It is one of the few
Google partners to include the cached links in the
results.
Teoma has improved its phrase searching so
that it now does exact matches, also adding an OR operator
that must be in all upper case letters. Without user-specified
nesting, the processing of a simple x y OR z gets treated
as (x AND y) ORz. Teoma has added a spell check feature,
in beta, for common English words but not proper names.
It has updated the database, expanded it by 60 percent
to about 350 million records. It now uses site collapsing
so that only the first two hits per domain are listed
with others under a "More results from" link. The results
now use a keyword-in-context (KWIC) display, and stop
words are searched if occurring within a phrase search.
It has added field searches using the prefixes of intitle:,
inurl:, and site:. An advanced search page should be
available soon to make these even easier to use.
The Wayback Machine has launched a "document
compare" feature that uses DocuComp technology to compare
two historical Web pages and highlight the differences.
Look for the "Compare Archive Pages" in tiny print
in the upper right hand corner after the search box
on a search results page to try out this feature.
Yahoo! finally announced a renewal with Google
for search engine results. While the "Powered by Google" logo
is gone from the top, the results actually rely more
heavily on Google than previously. A few directory
category matches and sponsor matches come first, but
then comes a new section labeled "Web Matches." This
replaces the old "Web Sites," which were entries from
the Yahoo! directory, and the "Web Pages," which were
from Google. The new "Web Matches" mix the two, putting
them in Google relevance order. Those items in the
directory will use the directory summary and title
rather than Google's and have a small red arrow that
links to the category. The advanced search has also
changed significantly. It now looks much more like
the Google advanced search. A direct link to the Yahoo!
directory itself is now available [dir.yahoo.com].
Greg
R. Notess (greg@notess.com;
www.notess.com)
is a reference librarian at Montana State University and
founder of SearchEngineShowdown.com.
Comments? Email the editor at marydee@infotoday.com.
|