[ONLINE]
web search feature

Specialized Search Engines: Alternatives to the Big Guys

David King

ONLINE, May 2000
Copyright © 2000 Information Today, Inc.

Subscribe

What do Ally McBeal, chicken soup, and the structure of DNA have in common?

Answer: You can find information about them on the Internet by using a search engine.

As of January 2000, many major search engines had databases containing pages in the hundreds of millions. With so many pages being indexed, searchers can now find almost

anything--some of which is actually useful research--in almost any imaginable field. But sometimes more is not better. At times, you may wish you could sit down to a simplified search engine that finds only on-topic information. Now you can, by using a specialized search engine.

Think of a mainstream search engine as a Super Wal-Mart, and a specialized search engine as the specialty shop down the street. A specialized search engine focuses on a specific subject, a geographic region, or a certain type of computer file format. As such, specialized search engines tend to index fewer Web pages; but in the process, they also weed out information that's not useful for a particular topic. As a result, on-target pages will more likely be found at the top of your results page.

Another difference between large search engines and specialized search engines is human interaction. Many specialized search engines employ subject specialists who actually gather, rank, and annotate each link. So, not only are entries weeded in order to be subject-specific, but those weeded entries are winnowed even further so that only useful information is left.

WHY USE A SPECIALIZED SEARCH ENGINE?

If what you want can be found using a major search engine, why go specialized? Here are four reasons serious searchers should consider:
  1. Save Time--A specialized search engine focuses on a niche subject, so there's a good chance you'll find what you need in less time.

  2. Vetted Databases--Since many specialized search engines manually choose their search engine entries, you'll be saved the frustration of clicking on a useful-sounding entry, only to find a page created by a 6-year-old for his homeroom class project.

  3. Unique Entries--Many specialized search engines gather useful sites from user submissions, rather than through the normal spider/robot process. There's a very good chance that some of these entries are unique to the specialized search engine.

  4. Annotations--Many specialized search engines annotate and, in some cases, rank the sites they index.

There are specialized search engines for many different categories. This article highlights groups of search engines for a number of the more popular categories, including the emerging multimedia search services.

HEALTHCARE

Health-oriented search engines focus on all aspects of the health/ medical world. They include subjects like: diseases, medical specialties, business aspects of healthcare, medical computing, and alternative medicine. Each search engine described below focuses mainly on professional information, and only somewhat on consumer-oriented information.

Achoo: Healthcare Online

Achoo (http://www.achoo.com/) has been around since April 1996, and is maintained by MNI Systems Corporation. Achoo's goal is to be "the most comprehensive healthcare information site on the Internet." To meet this goal, each of its indexed sites focuses on at least one aspect of healthcare, including: clinical health, alternative health, and business aspects of the healthcare field.

All submissions to Achoo go through a review process. According to Adam Kruszynski, a Web site administrator at Achoo, a junior site administrator examines every submission to Achoo. Descriptions are edited and entries are placed into the proper Medical Subject Heading (MeSH heading). If a site is difficult to classify, the site is then taken to a consultant to properly classify. This process makes sure that all entries are medically related, have medical content, are regularly maintained, and are ethically sound.

There are three ways to search Achoo: browsing, basic search, and power search. Achoo's directory is based on the National Library of Medicine's MeSH heading system. This makes browsing relatively easy for an experienced health searcher, since linked information is appropriately classified.

Basic searching offers many advanced features, including phrase searching, and All or Any searches. Each Basic search can be limited to full record, URL, title, description, or keywords. Advanced searching (available by clicking the search folder tab at the top of the main page, and then clicking on "Search Achoo") allows AND/OR Boolean searching, phrase searching, and site name and descriptions searching. Searches can also be limited to geographic location.

You can also limit to what Achoo calls "Site Content Qualification Filters." This feature allows very specific content limits, like companies, conferences, newsgroups, or peer-reviewed information. Each Achoo record lists a title, a brief summary, and a country designation.

HealthAtoZ.com: Your Family Health Site

HealthAtoZ (http://healthatoz.com/) was created in 1995 by a group of physicians, nurses, and pharmacists, and attempts to cater to both consumers and professionals. A search in HealthAtoZ will retrieve information from two sources: a reviewed Web site directory with over 50,000 entries, and a spider that retrieves items from the Internet.

A team of medical catalogers reviews each entry in the reviewed Web site directory. Each site listed is ranked using a three-star to five-star system. The ranking system grades for content (1-20 points), ease of use (1-10 points), layout (1-10 points), and level of appeal (1-10 points).

You can also search by entering information into a text box, which returns results from the reviewed Web site database and the Internet. You can enter one or more words into the search box, which will bring results in the following order:

  1. Titles that include the word or phrase (if more than one word was entered)

  2. Titles including either keyword

  3. Records including the word or phrase anywhere in the text

Each record found lists the title, ranking (if ranked), a brief review, a relevancy ranking, record type (professional versus consumer), and a URL. Related category subject headings are also provided, which are linked to other items in the reviewed Web site directory.

MedHunt: Medical Document Finder

MedHunt (http://www.hon.ch/MedHunt/) is produced by Health On the Net (HON) Foundation, located in Geneva, Switzerland, and was started in September 1996. This database is similar to HealthAtoZ in that it searches two areas--its own reviewed database, and the rest of the Web. It has international coverage, and results are not limited to English.

Records appearing in the reviewed database are given a "HON Code of Conduct" icon. Web sites are included in the HON database only if the site meets certain qualifications, such as: information is provided by a qualified health professional; information supports the relationship between patient and physician; modification dates, authors, and contact information are clearly displayed.

Records not found in the HON database are provided by MARVIN, a robot that searches the Internet using a 12,000-word medical dictionary. Each word in the dictionary is weighed depending on relevance and specificity to the medical field. When an appropriate Web site is found using this relevancy, the site is auto-indexed using the dictionary.

MedHunt allows Basic and Advanced searching. The Basic search feature allows Any, All, and Adjacent word searching, and can limit to geographic location and to only records found in the HON database. The Advanced search feature adds three more search boxes, and allows a guided Boolean AND/OR linking of the boxes. Each record includes a relevance score, the title and URL of the site, a listing of linked keywords, a brief description, the HON Code icon (if ranked), and a country and language designation.

Medical Matrix

The Medical Matrix project (http://www.medmatrix.org/) is "devoted to posting, annotating, and continuously updating full content, unrestricted access, Internet clinical medicine resources." Medical Matrix targets links in its database to physicians and health workers from the United States.

Medical Matrix is the smallest of the search engines in this article, weighing in at only 4,481 records (as of November 29, 1999). Those records go through a multi-step ranking process by an editorial board, made up mostly of physicians and health workers. They rank each site in terms of quality, peer review, full content, multimedia features, and unrestricted access. Records are given a star rating of one to five based on the ranking process.

There are two search options available for Medical Matrix users: a search box and a directory. The search box is very simple, with no help pointers. The directory is arranged in a loose MeSH category heading, and seems more useful than the Search box.

Once you've searched using either the search box or the directory, you'll be presented with a list of results. Each record provides a title, a partial summary of the site, the star rating of the site, and a link to "Details." The "Details" link provides more information: title, description classification (MeSH headings), rating, contact, URL, keywords, date entered, and last updated.

Searchers using Medical Matrix have to go through a free registration process before using the search engine for the first time.

LEGAL

Legal-oriented search engines focus on a wide variety of information. Some find primary sources like federal and state codes, and some provide stock answers to legal questions. Of the five mentioned in this article, two of them are very powerful search engines, two have searchable directories, and one is actually maintained by a law firm.

FindLaw

FindLaw (http://findlaw.com/) was launched in January 1996, with a mission to "make legal information easy to find." To do this, FindLaw focuses on: primary sources, such as codes, case law, and regulations; secondary sources like law journals and commentary; and tertiary sources like mailing and USENET discussion lists. One can also find directory information on lawyers, agencies, and organizations, and up-to-date legal news.

There are three ways to search FindLaw. To browse, simply choose an appropriate topic from the directory on the main page, or choose "For Lawyers," "For Students," "For the Public," or "For Business" to narrow your browse search to a specific target group.

To search the FindLaw directory, enter a search using the search box at the top of the page. Entering more than one word defaults to a phrase search. Other options, such as AND, OR, NOT, NEAR, and wildcard searching, are also available. Records found by browsing or searching FindLaw's directory display the title, URL, a short summary, and a clickable See Also subject listing.

LawCrawler

The third way to search FindLaw is by using LawCrawler (http://lawcrawler.findlaw.com/). This robot combines intelligent agents, the AltaVista search engine, and other legal code and case law databases to create an extremely powerful legal Web search.

To search LawCrawler, enter a word or phrase in the search box. LawCrawler allows AND, OR, NOT, and NEAR Boolean operators. One can also limit to specific databases using LawCrawler. "World Wide Sites with Legal Information" is the default database, which searches using AltaVista and FindLaw's intelligent operators. Other databases include: Legal News, Legal Dictionary, Law Reviews, Mailing List Archives, U.S. Constitution, U.S. Code, Supreme Court Opinions, and All Federal Circuits. Results found using LawCrawler include a title, a brief summary, the site's URL, document size (in bytes), and a revision date.

LawGuru

LawGuru (http://www.lawguru.com/) is maintained by the law offices of Eslamboly and Barlavi, located in Los Angeles, California. LawGuru features many unique ways to search for legal information. You can use the Legal Search Engines page, which allows searching of over 435 search engines, including others listed in this article. One can also search using the new Multiple Resource Research Tool. This tool focuses on state and national codes and state and federal court opinions, and allows simultaneous searching in more than one database. LawGuru's FAQ Collection features several hundred pre-written legal questions and answers in dozens of categories. You can also search the LawGuru BBS, which currently has over 13,000 questions and answers. Attorneys who have joined LawGuru's attorney network generate the answers.

Internet Legal Research Guide

The Internet Legal Research Guide (ILRG, at http://www.ilrg.com/), like the other legal search engines, strives to be a comprehensive resource of Internet-based legal information, and was created in 1995. There are four ways to search ILRG. The first is to search using ILRG's main categorized index of over 4,000 national and international law-related Web sites. Enter a word or phrase into the search box-- which allows you to search the whole site--or narrow to academia, professional, or government sites.

The second way to search ILRG is to use the enhanced search feature. This feature adds an assisted AND/OR feature, and allows URL searching and keyword exclusion. The third way to search ILRG is to browse the directory. The directory, called the Annotated Index of Features, is available on ILRG's main page. Records found using the first three search methods include a link to the site and, in some instances, a brief summary.

LawRunner: A Legal Research Tool

The fourth way to search ILRG is to use LawRunner: A Legal Research Tool (http://www.lawrunner.com/). This database is similar to FindLaw's LawCrawler, in that it uses a series of advanced query templates to connect and search the AltaVista search engine. LawRunner is set up to find international and U.S. legal information using AltaVista's advanced-search Boolean features.

SCIENCE

The specialized aspects of science are reflected in the types of search engines you can find devoted to this topic. Two of the sites cover a broad spectrum of scientific fields, and one focuses on biology.

SciSeek: Your Online Science and Nature Resource

SciSeek (http://www.sciseek.com/) is a guide to science and nature Web sources. It has a broad focus; sources range from agriculture to zoology. There are two ways to search SciSeek: browsing the directory and using the search box. The directory is arranged in scientific fields, such as botany, chemistry, or physics. These fields, as well as some subheadings for each field, are located on the main page. Clicking on one of these top-level links takes you to a subject directory for that particular field. Subheadings under each category list the number of links included under each heading.

The search feature is rather powerful. At first glance, the search box (located on the main page) appears very simple, allowing only Any and All limits. However, one can also search using more powerful options such as +/-, AND/OR, phrase searching, limiting to URL and mailto, and wildcard searching.

Results include a title, a user rating and the number of votes for that user rating, a summary of the site, a link to user comments, the number of user comments provided, and a "More Like This" link. Users can rate each site using a 1-10 rating scale-- these are then averaged and displayed on the results page. You can also leave a User Comment for each site.

BioCrawler: The Life Science Search Engine

BioCrawler (http://www.biocrawler.com/), created in 1995 and maintained by German company BioFacts, focuses on biology. Other disciplines include anthropology, biotechnology, and paleontology. According to Nils Koesters, Managing Director of BioFacts, BioCrawler currently indexes around one million pages.

Searching is performed through a search box or the directory. The directory can be accessed on the main page by clicking on the appropriate subject field. This search will then list subheadings in each category and list the number of pages found under each heading. Searching via the search box produces similar results.

Records found in BioCrawler include a title, the URL, a relevancy percentage, a summary taken from the site, and page size in bytes. The number of links on the page, the number of citations to the page (links from other Web sites to the page), and a numerical rank are also given.

Records in BioCrawler are ranked according to links found on a page. Top-ranked pages are those that are linked to, or cited, by many other pages. Koesters also indicated that pages that are linked to those top-ranked pages are also highly ranked.

Biolinks

Biolinks (http://www.biolinks.com/) was created in March 1997, and is a broad scientific search engine. The site doesn't provide much information about itself or its ranking system. In fact, more information is available about its new cartoon character that "hosts" the site rather than company or ranking information. But don't let that stop you from using Biolinks--it's still a good science-oriented search engine.

Biolinks can be searched using a basic or an advanced search box, or by browsing its directory. The basic search feature allows simple keyword searching in either the "entire database" or the "indexed database only." The service doesn't offer any description of differences between the two databases--but when doing a search, you'll find more sites using the "indexed database only" option. One can also choose between All and Any of the words entered in the search box.

In the Advanced Search mode, one can choose a more specific category (Meetings, Medical Sites, or Associations & Societies); narrow to "indexed, spidered, and entire databases" (again not explained); and narrow to full entries, title, URL, keyword, or page contents. Records found using these search methods provide a title, URL, brief description, keywords, and a contact address.

The directory can be browsed from the main page. You can choose a main topic, like Journals, Meetings, or Software, or choose one of the sub-topics also listed beneath each topic. Records found using the directory provide a linked title to the site.

MULTIMEDIA

Search engines that focus on specific file formats, such as photographs, audio, and video, have appeared in recent years. These search engines can be extremely useful when attempting to locate multimedia information, like corporate speeches, oral histories, or recent broadcast news events. Of course, they can also find that latest MP3 file of a groovy song, allow one to find a live radio broadcast, or find images of favorite cartoon characters.

Scour

Scour (http://www.scour.net/), started in December 1997, indexes over 25 million digital files, including music, video, and images, with a focus on entertainment. There are two ways to search using Scour: Basic and Advanced.

Basic searching allows you to enter keywords into a search box and narrow to audio, video, or images. The Advanced Search page allows +/- searching and offers an Everywhere/Web Site/Share selection. Web Site finds files on the Web. The Share function enables Scour's Media Agent software, which is available for free download. Media Agent searches publicly accessible servers and allows you to download shared files off of these servers. Scour warns, however, that the software is only as reliable as the source. A disconnected or crashed server that has the multimedia file you're looking for will not be accessible. Additionally, Scour denies any accountability for users who download any unapproved copyrighted material. According to Scour, "Just remember, we're not responsible for what kind of material you download--we just carve up the pie."

The Advanced feature also allows you to narrow to a specific file type/format, with choices like movie trailers, downloadable music, RealMedia, MP3, Liquid Audio, and Shockwave Animation.

Each item found includes a file name, dimensions (in pixels), file size (in kilobytes), the URL, and a thumbnail (for images and videos). Audio files found using Scour also include file format and a date.

ditto.com

Ditto (http://ditto.com/) has been around since 1997, and provides picture-search services to NBC and CNET's Snap.com. Ditto searches only for images. Employees at Ditto manually link images with keywords and weed out what is deemed objectionable material.

Searchers have many options in ditto. Search features include +/-, AND/OR, NOT, AND NOT, single and multiple character wildcards, filename, URL, and title searching. These features can be used in both the Basic and Advanced search areas. The Basic search box provides a text box and a Go button. The Advanced Search box also allows limiting to gif/jpeg files, file size (small, medium, or large), width, height, color depth (black/white, 8-bit, or 24-bit), and images added in the last week, last month, or the last six months.

Results appear as thumbnail images. A Details link is included that provides picture attributes such as page title, photo size, width, and height. (Editor's Note: For more information on ditto.com, including its recent court battle over copyright issues, see Paula Berinstein's THE BIG PICTURE, "Image Search Engines and Copyright," November 1999 ONLINE, p. 91.)

Streambox

Streambox (http://www.streambox.com/) is a relatively new multimedia search engine; it started in 1999. Streambox focuses on streaming audio and video, in RealAudio, RealVideo, and Windows Media formats. Within these formats, you can find anything from radio stations and music videos to live television segments or lecture series. Currently, Streambox indexes over one million streams.

Searching is very straightforward. The Basic search allows you to enter a word or words into a search box, and limit by audio, video, and/or live media formats. The Advanced search offers more options, including Any/All limits and field limiting (all fields, title, author, copyright, directory).

Results list format, file size and length, title, author and copyright information, URL, and a user rating. The rating system is based on popularity, and can be rated anywhere from Sweet! (highly rated) to Stinky! (poorly rated). Streambox also provides a "Send Clip" option that enables you to send a file to a friend.

WHERE DO I FIND SPECIALIZED SEARCH ENGINES?

There are a number of pointer sites that are helpful for finding specialized search engines. Think of these sites as search engines for the search engines. Danny Sullivan's Search Engine Watch (http://www.searchenginewatch.com/links/) has a listing of search engines. This list links to major search engines, children's search engines, metacrawlers, multimedia, news, regional, and specialty search engines. Specialty search engines found on this site are divided into subjects.

SearchEngineGuide.Com: The Guide to Search Engines, Portals, and Directories, located at http://searchengineguide.com/, is another good place to find specialized search engines. SearchEngineGuide currently indexes 2,243 search engines (as of November 30, 1999). All search engines featured on this list are divided into subject directories. Each entry provides a brief summary.

InvisibleWeb.Com: The Search Engine of Search Engines (http://invisibleweb.com/), created by Intelliseek, is another helpful search engine site. This Web site is one of a few sites devoted to finding information on the "invisible Web" (searchable information resources whose contents cannot be indexed by traditional search engines). Many search engines fall into the "invisible Web," since their index of links is stored in a database rather than on static Web pages. This site also lists all links in a subject directory.

Chris Sherman, the Web Search guide for About.com, maintains a site at http://websearch.about.com/. Aside from a wealth of resource links and current commentary on hot Web- search topics, Chris has compiled a long list of categories ranging from Careers and Jobs to Web Site Promotion. Within many of these categories you'll find links to corresponding specialized search engines.

CONCLUSION

With all the media attention about large, major search engines (the online world's "mega-marts"), one may not immediately think of using a specialized search engine. However, this type of "specialty shop" can save time and eliminate frustration when you really want a finite number of specialized hits.


David King (david@kclibrary.org) is Information Technology Librarian at Kansas City Public Library.

Comments? Email letters to the Editor at editor@infotoday.com.

[infotoday.com] [ONLINE] [Current Issue] [Subscriptions] [Top]

Copyright © 2000, Information Today, Inc. All rights reserved.
Comments