NEWSBREAK UPDATE
The Latest on Enterprise Search Products,
E-Books, and More
by Paula J. Hane
As we head into the busy late-fall season of conferences,
end-of-year company reporting, holidays, and school
activities, the number of press releases hitting my
desk seems to have accelerated as well. Many of us
in the industry will be meeting and greeting at venues
like Internet Librarian 2003 in Monterey, Calif., and
Online Information 2003 in London. Users will be looking
over the latest products and services. "Will they buy?" is
the question. Information providers will be checking
out competitors' offerings as well as potential partners
for technology and marketing alliances. There should
be a steady supply of news to report, as vendors continue
to announce new products and services through the end
of the year. Here's a wrap-up of some of the news from
the past month.
Gale Branches Out Another traditional information provider has linked
up with one of the big Web search companies. (See the
September 2003 NewsLink Monthly Spotlight at https://www.infotoday.com/newslink/newslink0309.htm.)
Gale, a subsidiary of Thomson Corp., has arranged to
link from most of its InfoTrac products to Google's
image collection. Gale chose not to license content
from expensive commercial image providers but instead
will implement Google Image Search. This partnership
will allow Gale users to search Google's more than
425 million images for no extra charge. (For more information,
see the NewsBreak at https://www.infotoday.com/newsbreaks/nb031006-1.shtml.)
Branching out into another information format, Gale
announced in July its forthcoming e-book program that
would offer a collection of reference titles with an
easy-to-use database interface. The Gale Virtual Reference
Library, which was set to launch at press time, allows
libraries to select from an initial collection of 85
reference sourcesencyclopedias, almanacs, and
seriesto create a customized, integrated online
information service with unlimited usage and 24/7 remote
access.
The Virtual Reference Library offers flexible options
that let libraries choose to buy one e-book or multiple
e-books and search across a single e-book or entire
e-book collection. The content is provided in HTML
format, so no special software reader is required.
An Adobe PDF option is available to view the actual
page layout. In addition, the company says it will
be releasing several hundred more e-book titles. Also,
in 2004, it will publish directory sources with a special
interface, allowing for more granular access to fielded
data.
An E-Book Revolution? The Open eBook Forum (OeBF), the electronic publishing
industry's trade and standards organization, recently
claimed that e-books have "quietly become a major force
in the worlds of media and technology." The group reported
that in the first half of 2003, e-book sales revenues
were up by 30 percent and unit sales were up by 40
percent over the same period in 2002. This compares
to an annual growth rate of just about 5 percent in
traditional print publishing. E-book sales are expected
to top $10 million in 2003.
OeBF revealed statistics on the current state of
e-books and provided an industry analysis in its first
quarterly "eBook and eDocument Publishing and Retail
Statistics" report. The quantitative assessment was
compiled from data submitted by 34 publishers and retailers.
Some of the information can be found at http://www.openebook.org,
but the full findings were only made available to participating
companies and OeBF members.
"Those of us in the industry have been seeing real
signs of growth from every direction," said OeBF executive
director Nick Bogaty. "Libraries are a huge growth
category as they look to revitalize themselves in the
age of Google, school systems are finding that today's
kids like to read when the media is digital, and consumers
are snatching up better devices and more titles as
fast as they can."
I don't think the consumer market is really ready
for mass-market e-book sales, which is why barnesandnoble.com
stopped selling them in September. I continue to believe
that certain types of content are much better suited
to the e-book format than othersfor example,
rapidly changing information, user guides, and reference
materials. Thus, I think that Gale will find considerable
success with its Virtual Reference Library, as did
Knovelwhich provides access to key STM reference
materialsand several other information publishers.
Mark Gross, president of Data Conversion Laboratory
(DCL), said, "An electronic book revolution has happened,
only it came in by stealth and was not reported by
the mainstream media." In a special DCLnews report
(http://www.dclab.com/stealth_ebooks.asp), he explained, "The
reason the media hasn't caught on to this story is
that digital books are not for everyone, and the digital
books people are using every day, such as technical
manuals and online reference titles, aren't perceived
as e-books."
In the report, Gross provided the following list
of "Five Prime Indicators for an eBook." They have
been summarized with the permission of Gross and DCL:
1. Readers need access to a very large amount
of data but are only interested in looking at a little
bit of it (e.g., reference books, technical
documentation, and legal libraries).
2. Data that change rapidly, such as technical
information and manuals
3. Rare books and manuscripts that are too
fragile to touch
4. Materials that have low publication and
distribution volumes
5. Self-published books
For additional commentary and perspective on e-books,
see Mick O'Leary's column in the September/October
2003 issue of ONLINE (p. 59). He stresses the
importance of institutional sales rather than sales
to individuals and subscription rather than transactional
pricing.
Linking Update
In last month's column, I reported on a number of
important linking-initiative announcements, noting
that this is a hot area of news. Recently, NFAIS, the
association for "organizations that aggregate, organize,
and facilitate access to information," released a document
titled "NFAIS Guiding Principles: Reference Linking" (available
at http://www.nfais.org/2003_Guiding_Princ_Ref_Linking.htm).
The group is encouraging all those involved in any
aspect of information creation or distribution to provide
for a reference-linking capability in their products
and services.
With so many linking arrangements already in place,
I wondered about the need for an official statement
at this point. Linda Beebe, chair of the NFAIS linking
committee and senior director of the American Psychological
Association's PsycINFO, agreed that indeed, "linking
is alive and well, but there's still a lot of work
to be done." She noted that the primary publishers,
particularly those in the STM field, have led the way
in reference-linking initiatives. The NFAIS committeeand
the entire NFAIS boardfelt it was important to
make a collective statement that would encourage other
publishing disciplines and the secondary publisherswhose
products are usually delivered on third-party platformsto
work on collaborative linking.
"The organization strongly believes that industrywide
collaboration in support of reference linking is essential
to managing the flow of scholarly communication," said
NFAIS president Marjorie Hlava. "Reference linking
provides a seamless navigation between bibliographic
and full-text databases, speeding the research process
and ultimately accelerating discovery across all scholarly
disciplines as well as in business."
This is a worthy cause indeed. Let's hope to see
the widespread adoption of these principles.
Enterprise Search Is Hot
I've noticed a recent buzz of activity from companies
announcing new and improved enterprise search products.
Most of the newer products do much more than just provide
keyword searching. The clear trend is to integrate
entity extraction, linguistic technologies, taxonomies,
and classification with search technology to offer
users better search results with less work.
I recently reported on the launch of Endeca's new
ProFind 4.0. (See the October 2003 NewsLink Monthly
Spotlight at https://www.infotoday.com/newslink/newslink0310.htm.)
This enterprise search solution uses the Endeca Search
and Guided Navigation engine, which combines full-text
searching with navigation capabilities. The company
says ProFind is different from other search engines
because of its ability to discover relevant relationships
in data and find accurate and precise results with
unprecedented speed.
Endeca ProFind can handle all types of content (both
structured and unstructured) within an enterprise,
including databases, documents, or e-mail. Business
partners like ClearForest provide rules-based native,
entity, and concept extraction from the content. ProFind
can be integrated with existing taxonomies.
Copernic, a company known for Copernic Agent, its
consumer metasearch product, officially launched Copernic
Enterprise Search. (See the NewsBreak at https://www.infotoday.com/newsbreaks/nb031006-2.shtml.)
The company chose not to target the high-end enterprise
search market of Fortune 500 firms that currently is
dominated by companies like Verity and FAST. It instead
offers a product that's specifically designed to meet
the needs of the small-to-medium-sized enterprise (SME)
and the departments of larger enterprises.
Copernic Enterprise Search uses advanced linguistic
and statistical technologies to identify the key concepts
and sentences of indexed documents. It also does automatic
indexing of new and updated documents in real time.
In addition to handling internal information in many
formats, the software can index external Web pages
and supports indexing of XML feeds.
Northern Light is on a comeback path since its repurchase
from the "Divine demise." (See the NewsBreak at https://www.infotoday.com/newsbreaks/nb030602-1.shtml.)
Known since its original founding in 1996 for its taxonomy
and classification that use patented clustering technology
as well as for its results folders, Northern Light
employs search, classification, and content integration
technology and services to offer user-friendly search
solutions for corporate clients. Although it hasn't
made an official announcement, the company, led again
by CEO David Seuss, has released its Northern Light
Enterprise Search Engine, an offering that it says
delivers performance, relevance, and "unparalleled
scalability."
The Northern Light Enterprise Search Engine for Solaris
operating systems uses the technology that powered
the Northern Light Web search engine. It can search
up to 25 million documents on a single software installation
on a single server. The price is certainly right. A
license for a 150,000-document database is only $2,500
per year, including support and updates. The company
also offers a free 30-day evaluation copy to install
and try. Watch for additional news about Northern Light
coming soon.
Other vendors operating in this space are Autonomy,
Convera, Inxight, Stratify, and Verity. Some of the
companies that provide additional pieces of technology
and partner with search vendors include iPhrase, Antarctica,
Intelliseek, and ClearForest.
Information Discovery
At press time, ClearForest Corp. announced the availability
of ClearForest 5.0. The company offers products that
read vast amounts of structured and unstructured text;
extract relevant information that's specific to users'
requirements; and provide visual, interactive, and
textual executive summaries. The new 5.0 release adds
relationship-analysis tools, four new industry-specific
solution modules, and enhanced database scalability.
ClearForest now has the ability to tag and analyze
Arabic and Hebrew in addition to Western European languages.
ClearForest is not a search engine, but it can work
with them. Its ClearTags platform produces standard
tagged XML that can be searched with other software,
such as Endeca. Barak Pridor, ClearForest's CEO, calls
it a business intelligence solution that provides for
the discovery of facts, patterns, trends, and relationships,
which would otherwise be hidden within an organization's
unstructured data. He said: "If you know what you're
looking for, use a search tool. If you don't know what
you're looking for, use a discovery tool."
By the way, ClearForest uses some nifty visualization
technology to clearly represent the revealed relationships.
While Information Today has covered some developments
in visualization technologies and products over the
last few years, the trend toward incorporating visual
representations seems to be finally making inroads
into solid applications like this one.
Antarctica Systems, Inc., a company built on the
principle that most people respond better to visual
presentations, recently announced version 4.0 of its
Visual Net (VN) software. VN provides a map interface
to information of all kinds. The company redesigned
the entire user interface, upgraded the underlying
technology, and built in additional interactivity.
The changes position VN to handle the data complexities
of large enterprises, a market it's now heavily targeting.
Antarctica also partners with business-software vendors
and search engines. (See the NewsBreak at
https://www.infotoday.com/newsbreaks/nb030929-2.shtml.)
Finally, IBM launched its long-awaited WebFountain,
a Web-scale text-mining and discovery platform that
extracts trends, patterns, and relationships from massive
amounts of unstructured and semistructured text. With
more than 1 petabyte (1,024 terabytes) of content already
in storage, it's well on its way to mining the entire
Web.
The WebFountain platform will be used to develop
new products and services in partnership with other
companies. It offers some truly impressive components:
a supercomputer-based infrastructure; multi-terabyte
data stores; and text analytics that include natural
language processing, statistics, probabilities, machine
learning, pattern recognition, and artificial intelligence.
The possibilities for this information-discovery platform
could be tremendous.
Factiva announced a partnership with WebFountain
to develop a service called Reputation Manager, which
is scheduled for release in the second quarter of 2004.
Reputation Manager will combine more than 2 years of
Factiva content with WebFountain's Web data to let
executives discover what the world is saying about
a company and its products. When combining premium
content with Web content, it's not hard to envision
any number of potentially useful business information
tools. (See the NewsBreak at
https://www.infotoday.com/newsbreaks/nb030922-1.shtml.)
Web Search
With companies like IBM crossing over into the Web,
it's logical to wonder what search engine companies
like Google and Yahoo! will do next. Who knows what
they might be working on or testing already?
Meanwhile, what seems to be of most interest are
shopping and opportunities to sell. Yahoo! recently
rolled out Yahoo! Product Search, an e-commerce search
engine that will power the redesigned Yahoo! Shopping
(http://www.shopping.yahoo.com). The site offers features
such as side-by-side product comparisons, detailed
buyer's guides, a tax and shipping calculator tool,
consumer product and merchant ratings, product reviews,
etc. I have to admit, the advanced search features
are pretty effective. Even Chris Sherman of SearchDay
said the Pin Point product-recommendation tool is "seriously
cool."
Several media outlets report that Amazon.com has
formed A9.com, a new, independent business unit that's
charged with building a shopping search tool for internal
use and for other companies. And Google has been beta-testing
its Froogle product-comparison search since December
2002.
Also new at Google Labs, the company's technology
showcase, is Search by Location, a feature that lets
users focus a search on a specific U.S. location and
then provides a map from MapQuest with the results
marked. Overture is reportedly also testing a local
search capability. In a recent SearchDay article, Danny
Sullivan claimed that in preliminary tests, local searching
was generally still a disappointing experience.
In other news, Amazon and Microsoft announced that
Amazon will provide Microsoft Office 2003 users with
seamless access to Amazon.com from within Microsoft
productivity applications via the Research Task Pane.
Users will be able to access Amazon.com information
and make purchases without launching a browser or leaving
their document, e-mail message, or presentation. (I
told you selling was big.) Previously announced partners
for a spot in the Research Task Pane include Factiva,
Gale, Alacritude, LexisNexis, Ovid, and Elsevier.
Just a cautionary note to illustrate the danger of
putting too many eggs in a single basket: LookSmart,
a paid search provider, lost its contract with Microsoft,
which had supplied more than half (!) of its revenues.
LookSmart shares have plummeted.
The Wave of the Future
The eBusiness Research Center (eBRC) at Penn State's
Smeal College of Business Administration has launched
SmealSearch (http://smealsearch.psu.edu), a new niche
search engine that targets business research documents
on the Internet. SmealSearch finds and catalogs academic
articles, working papers, white papers, consulting
reports, magazine articles, published statistics, and
business facts by crawling the Web sites of universities,
commercial organizations, research institutes, and
government departments.
SmealSearch is built on NEC Research Institute's
CiteSeer, the largest search engine for scientific
literature. SmealSearch is the second search engine
launch by eBRC in the past year. It follows the late-2002
rollout of eBizSearch, a resource that helps researchers
access relevant and current information in e-business,
e-commerce, and other closely related topics.
"General-purpose search engines can only carry researchers
so far," said Lee Giles, associate director of research
at eBRC and creator of the technology on which SmealSearch
is based. "In the future, we predict the evolution
of increasing numbers of powerful niche search engines
that address specific needs of specific audiences."
I couldn't agree more.
For the latest industry news, check https://www.infotoday.com every Monday morning. An easier option is to sign up
for our free weekly e-mail newsletter, NewsLink, which
provides abstracts and links to the stories we post.
Paula J. Hane is Information Today, Inc.'s news bureau chief
and editor of NewsBreaks. Her e-mail address is phane@infotoday.com.
|