Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology DBTA/Unisphere
PRIVACY/COOKIES POLICY
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



Magazines > Online > May/June 2004
Back Index Forward
 




SUBSCRIBE NOW!
Online Magazine
Vol. 28 No. 3 — May/June 2004
DEPARTMENTS
Internet Search Engine Update
by Greg R. Notess
Reference Librarian, Montana State University

Internet Search Engine Update goes up on the Web at http://www.onlinemag.net as soon as it is written, approximately one month before the print issue mails to subscribers.

The big change in the search engine space this time is Yahoo!'s launch of its own search engine. Instead of relying on Google to provide the bulk of its search results, Yahoo! now has its own Inktomi-based search engine database. Many of the other changes and announcements appear to be a reaction to it.

AltaVista and AlltheWeb, although now owned by Yahoo!, still continue to have their own, unique databases, plus separate, unique search features. Although their image, news, and media databases have been combined, their main Web page databases continue to be separate and continue to be updated. While Yahoo! claims that eventually all will share the same Yahoo! Database—that was originally supposed to have happened late in 2003—it has not yet occurred. The good news is that even when (and if) these engines do have a common database, the advanced search features at each are supposed to continue to be available.

Ask Jeeves has announced that it will be purchasing Interactive Search Holdings, a company that includes the MyWay.com, My Search, My Web Search, iWon, and Excite search sites. These sites currently use either Google results or offer results from several search engines. Once the purchase is completed, these sites are likely to be switched over to Ask Jeeves/Teoma search results. Ask Jeeves has also suspended its Index Express paid submission service for large (over 1,000 URLs) sites, although it continues to accept paid submission from smaller sites. It still has no free submission, although its Teoma search engine (which also provides the bulk of results for Ask Jeeves) has been quite successful at finding most sites, even without the free submission.

Gigablast continues to add new features. The latest additions are the links to archived pages for each of the URLs in a results list. This is in addition to the cached archive of that page. The new links are labeled as "older copies" and link directly to the Internet Archive's Wayback Machine. Gigablast has added a related concept section called Giga Bits, which displays at the top of the search results page. One more small change—Gigablast now has stop words. Very common words, such as "a," "is," and "the," will not be searched. Gigablast does display a message noting which terms have been ignored. Put a "+" in front of terms or include them within quotation marks as part of a phrase search to search them.

Google announced an expanded Web database (from 3.3 billion to 4.285 billion) and an enlarged image database (doubled to about 880 million images). Google's last announced increase went to 3.3 billion a few days after AlltheWeb announced a larger number than Google's previous number; similarly, the timing of this Google announcement was on the same day that Yahoo! announced dropping Google and the launch of its own database. With that being said, Google's database growth is still significant. On several actual searches, it does not seem to find that much more than it did before the announcement. On a few, Yahoo! finds even more results than Google. Yet for most searches I tested, Google still retrieves more results than the others. In other Google developments, the Google news alerts have expanded beyond English, to French, German, Italian, and Spanish. Google Labs has a new experiment providing access to the Froogle shopping engine via wireless devices such as mobile phones and PDAs. Beyond its automatic stemming, Google is also sometimes searching for English synonyms of query words. Use a "+" in front of each term to turn off the automatic stemming and synonym searching. And lastly, the site: search now works by itself and no longer requires the addition of another search term.

Lycos is changing again. Its new interface is designed around the idea of social networking, a recent hot topic on the Internet. While much of the content on the older site remains, including search, the focus is now more on personal publishing with blogs and home pages, along with searching for people and groups. At this point, the general Web search remains and continues to be powered by the same database used by AlltheWeb. Lycos-owned HotBot continues to be focused just on search and offers access to the HotBot (Inktomi), Lycos (AlltheWeb), Ask Jeeves, and Google databases. Additionally, Lycos launched the HotBot Desktop, a browser-based search toolbar that enables Web, individual hard drive, and RSS feed searching.

MSN Search no longer includes results from LookSmart. In the past, directory listings from LookSmart came before the Inktomi search engine results. Now, MSN Search uses only the Inktomi results. It has also launched its own toolbar. MSN continues to experiment with various beta versions of its own search engine, and now that Yahoo! has launched its own version, many expect to see a new MSN search engine sometime in 2004.

Yahoo! launched a new search engine database [http://search.yahoo.com] that no longer uses Google results. Instead, it appears to have results based in part on an Inktomi-based database, but the results differ from other Inktomi-based search engines, including MSN Search and HotBot. One major useful addition is the cached copy of pages that continue to be available—this is the first major search engine beyond Google to include this useful feature. In a similar way, HTML versions of PDF and other file types are also available. According to Yahoo!, only the first 500 KB of a document is indexed, which is better than Google's 101KB but still short of full-document indexing available at AlltheWeb. It appears to be able to handle full Boolean searching using AND, OR, NOT, and parentheses for nesting. The new Yahoo! search uses field searches similar to Google's, such as intitle: and inurl:, along with site:, link:, hostname:, and url:. The image database continues to pull from Google at this point. In addition, Yahoo! announced its new Content Acquisition Program, which is designed to help both noncommercial and commercial content providers to get more Web resources into the Yahoo! database. On the noncommercial side, Yahoo! is working with sites like the Library of Congress, NPR, Project Gutenberg, and UCLA's Cuneiform Digital Library Initiative to make sure their content is included in the database. Inclusion is not supposed to change relevance ranking, but it may help move more material from the invisible Web into the database.


Greg NotessGreg R. Notess (greg@notess.com; www.notess.com) is a reference librarian at Montana State University and founder of SearchEngineShowdown.com

Comments? Email the editor at marydee@infotoday.com


       Back to top