Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology DBTA/Unisphere
PRIVACY/COOKIES POLICY
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



Magazines > Online > July/August 2005
Back Index Forward
 




SUBSCRIBE NOW!
Online Magazine

Vol. 29 No. 4 — Jul/Aug 2005

On The Net
Scholarly Web Searching: Google Scholar and Scirus
By Greg R. Notess | Reference Librarian, Montana State University

Google introduced a brand-new concept with Google Scholar [http://scholar.google.com]—specialized search aimed at finding scholarly information on the Web. With an initial focus on research articles from publishers participating in the CrossRef project and several collections of online preprints and other major scholarly sites, Google established a new approach to a broad range of scholarly literature (although its original coverage was stronger in science and technology than in the social sciences). In true Google fashion, the new search tool not only displayed links to individual documents, it also included citation references extracted from other documents using special algorithms developed at Google.

Some librarians decried this poaching of our information space, while Google advocates foresaw Scholar as the first and only source for research information. We have seen this type of rhetoric before. Remember when Google launched Google Answers back in 2002? The ensuing hue and cry bemoaned how this would compete with library reference services. Google Answers continues as a fee service, but it is certainly not a major Google money-maker, nor has it caused the death of library and information services anywhere.

Is Google Scholar destined for a similar fate? Time will tell whether it becomes a major access tool and replaces some of the traditional indexing and abstracting services or ends up as yet another orphaned initiative. In the meantime, it offers certain benefits and uses, as do several other free Web-based scholarly search tools such as Scirus. Unfortunately, none are even close to comprehensive. Each tool covers one segment exclusively or in very different ways.

THE FREE SCHOLARLY TOOLS

Google Scholar is just one of the more recent additions to a long line of academic, scientific, and other scholarly Internet search tools. In the early days of the Web, bibliographic databases such as UnCover and Agricola were available along with many library catalogs. Now many more bibliographic databases exist along with working papers, preprint and e-print collections, free journals, and many other specialized scholarly resources.

Hundreds of free, academic-oriented tools are available; hundreds of commercial ones are available as well. Academic libraries subscribe to a multitude of commercial online bibliographic and full-text resources and create links to many of the free tools. Covering all of these tools is well beyond the scope of this column, so I’ll just take a look at two of the broad, multi-disciplinary free Web resources, with some comparison to commercial resources.

One of Google’s great advantages is its incredible public relations ability and the general buzz it creates with new announcements such Scholar. If use of Google Scholar rises, it may help lead more users to an institution’s subscriptions. Elsevier’s Scirus, which has similar coverage to Google Scholar and has been around longer, is a less-well-known scientific search engine covering journal articles and Web sites.

GOOGLE SCHOLAR

Google Scholar aims to include “peer-reviewed papers, theses, books, preprints, abstracts, and technical reports from . . .  academic publishers, professional societies, preprint repositories and universities, as well as scholarly articles available across the Web” (see http://scholar.google.com/scholar/about.html). Basically, Google Scholar includes Web pages that either look like an article or other scholarly document.

Even after 6 months, although still in beta, Google will not release a list of sources. However, it’s clear that Scholar includes journal articles from various publishers, abstracts from bibliographic databases, and data from e-print servers. Some prominent collections include ACM, Annual Reviews, arXiv, Blackwell, IEEE, Ingenta, Institute of Physics, NASA Astrophysics Data System, PubMed, Nature Publishing Group, RePEc (Research Papers in Economics), Springer, and Wiley Interscience, although not all in their entirety. Many Web sites from universities and nonprofit organizations are included but only documents that seem like scholarly journal articles.

From all these sources, Google Scholar displays several types of records:

• Web documents

• Article citation-only records

• Book citation-only records

Each of these types has a different appearance, along with some accessibility issues. What I call the “Web documents” are those records whose title is a link to a Web page that either describes the document or links directly to an online version of the document. The citation-only records for articles and books have a [citation] or [book] notation, respectively, before the title. These records have been extracted from the bibliographies in the Web documents and do not link directly to additional information. The article citation-only records would be much more useful if numbers for volume, issue, and pages were included.

Anyone using Google Scholar needs to understand the functions of the other links for each record. The Web Document records can have multiple sources, as in the “Occupational Allergy to Cyclamen” article, which lists both a Blackwell-Synergy link and one from ncbi.nlm.nih.gov (which means a PubMed citation). The title links to the first listed source. If there are more than three sources, Scholar may have a link for “all X versions >>,” in which the X gives the total number of sources. The multiple sources can point to various Web pages—abstracts, preprints, publisher’s copy, author copies, and more.

The “Cited by X” links to Web documents that include the given record in their bibliographies. The “Web Search” link will run a regular Google search using the primary author’s last name and a phrase search of the document title. This can pull up other documents not in Google Scholar. The “Library Search” link which appears on book citation-only records connects to an Open WorldCat search for the book.

The “UC-elinks” link in the example is an OpenURL link for the University of California system. Google Scholar preferences can choose up to three resolvers from a few dozen academic institutions. These links will connect to library-licensed full-text content and additional information if the searcher knows to set this preference and is located at one of the few institutions listed. However, OpenURL links do not appear on all records or even all Web Document records from fee-based publishers. The “Genetic Structure” record in the example should have one but does not.

SCIRUS

Back in 2001, Elsevier launched Scirus as a Web search engine that would search both Elsevier’s online journals in ScienceDirect along with selected, science-oriented portions of the Web. In the earliest days, Scirus had a fairly limited version of the published and Web-accessible scholarly literature. It has grown since then to include Academic Press articles, MEDLINE citations, and, most recently, 13 million patents. Other article sources include BioMed Central, Crystallography Journals Online, Project Euclid, Scitation, and the Society for Industrial & Applied Mathematics. Web-accessible preprints are available from arXiv, CogPrints, and NASA. The Web site has also expanded to include more sources. Unlike Google Scholar’s inclusion of Web documents that look like articles, Scirus includes regular Web pages.

Some of these scholarly resources, such as BioMed Central, PubMed, and arXiv are covered by both Scirus and Google Scholar. One major collection is only included in Scirus—the 1,800-plus Elsevier journals. Although not included in Google Scholar as Web documents, some may show up as article citation-only records. In Scirus, the Elsevier journals are one of the major collections.

COMPARISONS

A number of other authors have noted some problematic limitations with the early Google Scholar. Péter Jacsó’s December 2004 review at the Digital Reference Shelf [http://snipurl.com/dwco] contains an extensive critique and provides evidence that far fewer documents are found with Google Scholar compared to the native search interface of the publisher. He also created a tool for ongoing comparisons [http://snipurl.com/dxjx]. In February, Rita Vine noted in her blog [http://snipurl.com/dwda] that the PubMed records in Google Scholar are missing the most recent year’s records and are much less complete than a direct search at PubMed. Unfortunately, both of these problems continue as of April 2005.

A quick comparison with a search for the terms protonation alkylation finds a claim of 2,068 journal article hits and another 1,524 Web results at Scirus. The same search at Google Scholar reports “about 1,820” records of all types. Given Google’s usual difficulty in accurately counting results, that number is probably within about 500 records or so of the actual amount. On other searches Scholar finds more, but since each covers unique content, neither is comprehensive. The same search in the native interface American Chemical Society (ACS) publications database finds 21,685 articles. The ACS journals are included in neither Scholar nor Scirus.

When looking at coverage of something such as PubMed that is included in both, results also vary. A search on cicatrix finds 12,780 results at PubMed, 11,058 at Scirus, and only 7,660 at Scholar. Given the lag problem with Scholar, further limiting to only results from 2001, PubMed gets 461 to 420 at Scirus and only 294 at Google Scholar.

Both Scholar and Scirus search through the full text of an article, but this is inconsistent. Searching phrases found toward the end of an article may fail to retrieve the article. For those online journal packages that include full-text searching capabilities, using the native search interface will be more comprehensive. On the other hand, some online journal suites do not have full-text searching capabilities, in which case Scholar or Scirus may be a more comprehensive option.

For fielded searching using authors, date, subject terms, or article type, the commercial databases and native search interfaces have many more choices. Scholar does have author, title, date, and publication fields in the advanced search, but the fields are far less reliable than in a structured database. More problematic is the lack of any date sort capabilities in Scholar. At least Scirus has date sorting. The Scirus advanced search has field choices for author, title, date, keyword, ISSN, author affiliation, and publication along with limits for broad subject areas, collections, file formats, dates, and information types.

The freshness of these databases is a significant issue. As Joann Wleklinski noted in her May/June 2005 ONLINE article (“Studying Google Scholar: Wall to Wall Coverage?,” pp. 22–26), the database used by Google Scholar is static at this point—it’s not adding newer documents. Scholar definitely needs to be updated more frequently. In fact, at this point, the main Google Web search is a much better tool for finding recent scholarly documents than Google Scholar.

USES

Despite all the limitations and problems, both offer some unique reasons to use them beyond just watching their future development. For a quick, broad, multidisciplinary search on a very narrow, specific topic, either Scholar or Scirus can give a good start. For citation verification, both can help find erroneous as well as correct citation information. The Cited By links at Google Scholar can be a useful adjunct to the more comprehensive citation tracking from citation indexes via ISI’s Web of Science (or can function as a partial replacement for those without access).

At this point, my main use of both is for finding free Web versions of otherwise inaccessible published articles. I found a number of full-text articles via Google Scholar that are PDFs downloaded from a publisher site and then posted on another site, free to all. Both Scirus and Scholar were also useful for finding author-hosted article copies, preprints, e-prints, and other permutations of the same article.

For the unaffiliated scholar, or for those in a small organization (government, association, or small research lab), these tools provide both opportunity and frustration. The opportunity? These scholars can use both tools to search for resources. The frustration comes when a specific document is found, but it is available online instantaneously only for those willing (and able) to pay.

Strangely enough, both of these tools may work better, or at least appear to work better, for the affiliated scholar. With all the subscriptions available on campus based on IP access authentication, the campus-based researcher finds that the links in Google Scholar and Scirus work seamlessly, providing direct access to the full-text articles. Both would work better if the Open-URL resolver could be added automatically, based on IP address, since many institutions have multiple access points, or like us, have our Elsevier subscriptions on a non-ScienceDirect platform.

At my library, scholarly information searching remains with our library’s commercial databases or via the general Web search engines rather than using either Scholar or Scirus. The one student who mentioned using Google Scholar prior to coming to the reference desk expressed frustration with it since, “Everything I found cost money.” Had our OpenURL resolver been enabled, that might have helped, but her question was answered far better in one of our commercial databases. Both Scholar and Scirus have potential for information professionals and end users. At this point, each covers a certain segment of scholarly material, but plenty of problems remain. Other search tools continue to serve the scholarly community better.

 


Greg R. Notess [greg@notess.com; http://www.notess.com/] is a reference librarian at Montana State University and founder of SearchEngineShowdown.com.

Comments? E-mail letters to the editor to marydee@infotoday.com.


       Back to top