NewsBreak
Update
Updates on Projects, Partnerships,
Improved Services, and More
by Paula J. Hane
I recently returned from the
Internet Librarian conference. As usual, it offered a useful mix of practical
and visionary presentations along with excellent networking opportunities.
Though the event is organized by ITI, the publisher of Information Today,
my roles were as attendee and reporter.
The conferencein the ever-popular Monterey, Calif., locationdrew
more than 1,100 participants despite continuing economic woes and curtailed
travel budgets. I attended excellent sessions on e-resources and digital libraries,
searching and search engines, content management strategies, technology trends,
and more. Look for coverage in the January 2004 issue of Information Today.
Definitely Too Much Info
Researchers Peter Lyman and Hal R. Varian from the University of CaliforniaBerkeley's
School of Information Management and Systems have released the results of their
study "How Much Information? 2003" (http://www.sims.berkeley.edu/research/projects/how-much-info-2003).
According to the report, "Newly created information is stored in four physical
mediaprint, film, magnetic, and opticaland seen or heard in four
information flows through electronic channels: telephone, radio and TV, and
the Internet."
The study found that we produced about five exabytes of new information in
2002. Of this, 92 percent was stored on magnetic media, mostly hard disks.
According to the report, five exabytes of information is equivalent to the
information contained in a half million new libraries the size of the Library
of Congress' print collections. And this is just in 1 year. I knew there was
a reason I felt so overwhelmed.
The report's executive summary provides some interesting statistics. For
example: "In 2000, we estimated the volume of information on the public Web
at 20 to 50 terabytes; in 2003 we measured the volume of information on the
Web at 167 terabytesat least triple the amount of information. The surface
Web is about 167 terabytes as of summer 2003. BrightPlanet estimates the deep
Web to be 400 to 450 times larger, thus between 66,800 and 91,850 terabytes."
Interestingly, the authors say they view the report as a "living document" and
intend to revise it based on comments, corrections, and suggestions.
The bottom line is that we're drowning in data. (Some days it feels like
spam alone could do me in.) The severity of this situation, especially for
businesses, makes content management, knowledge management, and search/
browse/discover tools ever more critical for handling it all. Solutions from
companies I discussed in last month's columnthose that provide information-extraction
tools for unstructured content and those that deal with indexing, taxonomies,
classification, clustering, content integration, information discovery, relationship
analysis, etc.will be more in demand.
We're also seeing some welcome interface developments that incorporate visualization
and other techniques to let users work within more contextual information spaces.
A number of the presentations at Internet Librarian related to these issues
and solutions.
Getting into Books
Amazon recently introduced "Search Inside the Book," a full-text search feature
that lets users delve directly into the content of books. This new service
taps more than 33 million pages from more than 120,000 nonfiction and fiction
titles provided by 190 publishers. In November (https://www.infotoday.com/newsbreaks/nb031103-1.shtml),
Barbara Quint reported on publishers' reactions and explained the copyright
issues raised by authors. Amazon indicated that a week after launching the
new service, sales for full-text searchable books outpaced those for other
books by 9 percent.
Quint said: "Full-text book searching has been available for some time from
such services as ebrary, OCLC's netLibrary, even Project Gutenberg and other
public-domain book sites. And those are just the more current and Web-oriented
examples. Earlier examples in traditional online also exist. However, to offer
it at this scale with such a promise of future growth and at no charge to the
user or the user's institution, does promise a new level of access for 'library-quality'
materialnot to mention, more revenue for Amazon."
OCLC, Google
One development of great interest is OCLC's recent announcement that its
Open WorldCat pilot project will begin testing access to WorldCat records through
Google. OCLC is working with selected Web sites, including the Google search
engine, to provide links to the records of WorldCat libraries. This will ultimately
help users find local libraries that have the items they want. The project
is using a 2-million-record subset of the most popular and widely available
books from the more than 53 million records in WorldCat. OCLC will analyze
Open WorldCat using feedback, surveys, and statistics. In June 2004, OCLC will
decide whether to expand, continue, or end the project.
The goals of the project are to expand the visibilityand utilityof
libraries and increase the quality of materials that are accessible from the
Web. In an October NewsBreak
(https://www.infotoday.com/newsbreaks/nb031027-2.shtml),
Barbara Quint noted that the expansion to include OCLC's records clearly fits
Google's mission statement to "organize the world's information and make it
universally accessible and useful."
More Google Activities
Google Labs has released Google Deskbar, a search application experiment
that lets PC users perform Google searches at any time from any application
without opening a browser. Google Deskbar is a free software download (from
http://labs.google.com) that appears as a search box in the Windows taskbar.
I haven't had time to test it out yet.
Earlier this year, I downloaded Quick Search Deskbar, a similar tool from
HotBot (which let me search using Google and three others). Gary Price of ResourceShelf
says that this offering is actually more robust than Google Deskbar. While
I thought the HotBot application offered some very handy shortcuts (and I liked
its one-click access to the Online Crossword Dictionary), I found that I just
forgot to use it. I was frequently in my browser anyway, so it was just as
easy to stick with my regular search habits. I have the Google Toolbar loaded
in my browser, and I use that quite often. Yes, we are very much creatures
of habit.
The Google Toolbar, by the way, was the recipient of the recently announced
Association of Independent Information Professionals' 2003 Technology Award.
Google says that Toolbar and Deskbar are complementary products that each accommodate
a particular search need.
Microsoft Update
Microsoft hopes to help users change their habits and get them to stay with
the familiar Microsoft Office applicationshopefully the upgraded Office
2003 package of products. The company's goal is to enable users to easily access,
integrate, and utilize information from diverse sources. The list of information
providers that partner with Microsoft to offer search connectivity from within
the Office Research Pane continues to grow. The companies that provide services
from within applications like Word and Excel include Factiva, Gale, Alacritude
(eLibrary), LexisNexis, Ovid, and Elsevier.
OneSource Information Services introduced several new business-intelligence
modules that deliver OneSource informationsuch as company profiles, industry
reports, executive details, news, and financial datadirectly into Microsoft
Office System programs. The solutions include the Catalyst/Account Intelligence
Module for Microsoft Office Word 2003 and the Catalyst/Financial Analysis Module
for Microsoft Office Excel 2003. OneSource is also supporting the Research
Task Pane within Microsoft Office.
Microsoft recently announced that it has teamed up with EDGAR Online, Inc.
EDGAR Online's secure XML Web service will transmit XBRL (eXtensible Business
Reporting Language) financial-statement data from EDGAR Online Pro to Excel
2003 through the Office Solution Accelerator for XBRL. This will allow investors
and analysts to use EDGAR Online's financial information for analysis directly
on their desktops. The companies expect this to be available in the first quarter
of 2004.
According to Joe Wilcox of Jupiter Research, Microsoft is trying to turn
Office, like Windows, into a platform onto which developers and businesses
build other programs or custom applications. The browser is becoming less important
as its functions are integrated elsewhere. And The Wall Street Journal recently
reported that Microsoft's new "operating system due out in 2005, code-named
Longhorn, is expected to help users simultaneously search the Web, their own
hard drives, and data on corporate networks." These are certainly developments
to monitor closely.
By the way, while Google prepares for its initial public offering of stock,
unconfirmed reports indicate that Microsoft and Google have discussed a partnership,
merger, or even a possible takeover of Google. At press time, the companies'
executives weren't talking, but the media was abuzz with speculation and commentary.
Some media outlets said that the Google Deskbar was Google's answer to cutting
out the browser and challenging Microsoft.
Alacritude Chooses FAST
I reported in my September 2003 column that Alacritude was changing its focus
from content to helping folks conduct more effective online searches. Now,
the company announced that it has selected FAST Data Search to power the search
functionality of its online research services eLibrary and Encyclopedia.com.
Patrick Spain, chairman and CEO of Alacritude, said: "In addition to increasing
the access speed and relevance of the tens of millions of proprietary newspaper,
magazine, and journal articles in our database, our services will be significantly
enhanced by FAST's powerful alerting and clustering features. This is just
the first step in a complete retooling of our online research services, which
we plan to complete early next year."
In August, LexisNexis integrated FAST Data Search with its LexisNexis Total
Search. FAST has now amassed a noteworthy list of customers, including FirstGov.gov,
Reed Elsevier, Reuters, and T-Online (Deutsche Telecom). In addition, a substantial
number of former AltaVista enterprise customers have renewed agreements or
have begun migrating to FAST's technology. FAST purchased AltaVista's enterprise
search technology earlier this year.
FAST recently reported some impressive financials. Its third-quarter 2003
revenues reached $11 million, an increase of 15 percent over the second quarter.
And "year-to-date revenues grew 18 percent from the same period last year,
as new business grew 60 percent for the year."
Bravo Vivísimo
Another company whose technology helps deliver more effective search results
is Vivísimo, which uses clustering to organize results into folders
or categories. InfoSpace, Inc. announced that it has selected the Vivísimo
Clustering Engine for deployment on its Web properties "to enhance the user
search experience." The clustering feature is now available across InfoSpace
Search & Directory's Web metasearch properties, which include Dogpile,
WebCrawler, and MetaCrawler.
Fortune 500 companies, government agencies, and publishers also use Vivísimo.
Cisco Systems recently licensed the Clustering Engine and Content Integrator
to complement the Google Search Appliance, a tool that serves thousands of
Cisco engineers. According to Vivísimo, the Content Integrator metasearches
collections on the Appliance through a single query. The Clustering Engine
then organizes the combined search results into folders.
Clustering and folders are of course not unique to Vivísimo. Northern
Light and other search engines also provide results folders. However, Vivísimo
claims that its technology is different: "Unlike solutions that require huge
investments in taxonomy-building or categorization, Vivísimo's Clustering
Engine organizes search results into folders on the fly, without requiring
any pre-processing of source documents." Vivísimo, headquartered in
Pittsburgh, was founded in June 2000 by Carnegie Mellon University computer-research
scientists.
ERIC Changes
In April, Barbara Quint reported on proposed changes to the ERIC database
(https://www.infotoday.com/newsbreaks/nb030421-1.shtml). ERIC has operated through
a network of 16 subject-specific clearinghouses that are responsible for acquiring,
selecting, indexing, and abstracting materials
in their area of interest for inclusion in the database. The clearinghouses
have provided information in response to requests by mail, phone, and e-mail.
Responses typically included a short list of citations from the ERIC database,
full-text articles, and appropriate referrals. Unfortunately, the changes are
now imminent.
The following notice was posted on the ERIC site in early November:
Changes Coming to
ERIC December 19, 2003
ERIC will begin a transition in late December as a new U.S. Department of
Education contractor develops a new model for the ERIC database and services.
ERIC clearinghouses' Web sites, including AskERIC, and their toll-free telephone
numbers will close on December 19, 2003. As of that date, you will be able
to use this Web site to:
Search the ERIC database
Search the ERIC Calendar of Education-Related Conferences
Link to the ERIC Document Reproduction Service (EDRS) to purchase
ERIC full-text documents
Link to the ERIC Processing and Reference Facility to purchase
ERIC tapes and tools
If you have other ERIC bookmarks, we suggest you change them to http://www.eric.ed.gov. For the latest industry news, check https://www.infotoday.com every Monday
morning. An easier option is to sign up for our free weekly e-mail newsletter,
NewsLink, which provides abstracts and links to the stories we post.
Paula J. Hane is Information Today, Inc.'s news bureau chief
and editor of NewsBreaks. Her e-mail address is phane@infotoday.com.
|