DEPARTMENTS
Letters to the Editor
Another View
of De-Duplication
In the opinion of MuseGlobal, "The Truth About Federated
Searching" in your October 2003 issue contains a
number of statements that are erroneous. In the interest
of
presenting your readership a more balanced view of
federated search technology, I'd like to correct
some of the misimpressions left by the article.
1. De-duplication does work.
Webfeat asserts that de-duplication doesn't work.
Their argument is that because a single search returns
a very large number of hits, say 100,000, you can't
claim to de-dupe unless you de-dupe every one of those
hits. This is like saying that a search doesn't work
unless you view all 100,000 of those hits.
Sure, it would take a long time to process all of
those records. In federated searching as in everything
else in life, there are trade-offs. Most searchers
initially retrieve a limited set of records from each
source. This allows the searcher to check their general
usefulness (and possibly decide to re-run the search
on fewer, more relevant sources). Not only does this
save time, but the searcher is not needlessly clogging
up servers by delivering thousands of records that
will almost immediately be discarded. In this way,
the de-duplication performance issue of dealing with
impossibly large numbers has been resolved. If the
first set of results is found wanting, the waiting
mass of results are still there and can be tackled
in manageable bites. With the right technology, the
next "bite" of results can be processed like the first,
and the new results can be quickly de-duped against
those left over from the first set of results.
It seems reasonable that you should be able to recognize
that a new record being added to the set is the same
as an existing one. Of course, you have to be merging
the results from multiple sources and processing them
all in an integrated results set to do this (like our
product, MuseSearch, and several other products do),
not maintaining them in separate groups by source (like
WebFeat does) in order to perform de-duplication. We've
been de-duping since day one, but we would be the first
to say that de-duping isn't perfect by any means. In
fact, we often state publicly that metasearching is
an 80/20 solutionyou're better off with metasearching
than without it, and it will only improve over time.
De-duping is one of the many differentiators among
federated search products; in fact, the ability to
de-duplicate results is one of the key requirements
articulated by users. Don't take our word for itsee
the detailed study sponsored by the National Library
of New Zealand that concludes "the consensus about
the role of a common user interface is that it should
be able to broadcast a single search to a variety of
databases in different locations and in different formats
and to unify the results from these databases, then
present them in a useful order and de-duplicate the
results (emphasis added). This is just one of the
reasons the study awarded MuseSearch top ratings. The
full study can be downloaded at http://www.natlib.govt.nz/en/whatsnew/4initiatives.html#review.
2. Federated search can be software or a service.
The WebFeat article asserts that federated searching
is best when offered as a service, and that this is
the only approach that avoids downtime for software
or source connector updates. The truth is, a centralized
service is not necessary in order to incorporate frequent
software updates without downtime.
Our Source Factory distributes software and source
connector package updates seamlessly, allowing extremely
high levels of service with very little local administration
effort. Updates can be made automatically, without
service disruption. Most of our technology partners
(COMPanion, Endeavor, Innovative Interfaces, Mandarin,
Sirsi, etc.) offer both local software implementation
and hosted service options. Most customers opt for
a local software implementation. Our experience has
been that local customization and security requirements
are best served in this way. The bottom line is, the
best option is flexibility to implement in the way
that is most effective for each user.
3. You do get better results with a federated
search engine.
The aim of MuseGlobal is to provide better results
with less effort. In general, you can get better
results with federated search than by using native
database search because, practically speaking, few
searchers would have the time or patience to do these
searches repetitively via individual search interfaces.
In the real world, federated searching can exponentially
improve the efficiency and quality of results.
We invite your readers to try Muse federated search
technologies for themselves with MuseSeek, our new
consumer-oriented Web metasearch engine (http://www.museseek.com).
Cheryl Wright
Vice President, Marketing
MuseGlobal, Inc.
Where in the World?
I have been a subscriber to Information Today for
some time now, and generally look forward to Barbara
Quint's articles, which are usually very pithy and
informative.
I must tell you however, that I was a bit bothered
by something she wrote in her October 2003 [Up Front]
article, in which she referenced Earth Station 5 as
being "reportedly based in Palestine."
Perhaps I shouldn't assume that she is aware that
there currently is no nation or state in the world
by that name, which was last used by the British during
the Mandate period?
A quick Internet search revealed that Earth Station
5 is located in Jenin, one of the autonomous areas
under the control of Palestinian Authority, which would
have been a more accurate way to describe the location.
As a sophisticated journalist, I'm sure she's aware
of the significance of names and of the importance
of accuracy. And to give her the benefit of the doubt,
I will assume the reference was an oversight and not
intentional, for to inject one's own politics into
a professional journal article is most unfortunate,
and unprofessional, as I'm sure you'd agree.
Thank you for your time and consideration.
Glenn Ferdman
Director, Asher Library
Spertus Institute of Jewish Studies
Chicago
|