| INVETIGATIVE REPORT On the TREC Trail
 By Barbara Quint
 
 Come, my children, and gather round the campfire. Listen
  to the old ones tell the tales of the long-forgotten ancient times and of the
  great deeds done when the world was young. Long before the coming of Google,
  before the howling of Yahoo!, before even the rise (and fall and rise again)
  of the dot-coms, an Internet was born. And the midwives at the birthing of
  the infant that would one day rise to hold the world in its mighty Web were
  ARPA (now known as DARPA, the Defense Advanced Research Projects Agency) and
  the National Science Foundation (NSF). Never forget, child clad in the red,
  white, and blue, that, whether for good or ill, 'twas the federal government
  of the United States of America that first brought the Internet to all the
  peoples of the earth.
  Even now, when the moon is full and the wind moves the clouds across its
  shining face, the ghosts of the ancients meet and ponder their creation and
  judge its works. Even now, if you listen silently, you can hear them whisper
  wisdom on distances traveled and distances yet unspanned.   In other words, each year the National Institute of Standards and Technology
  (NIST; http://www.nist.gov) gathers with DARPA (http://www.darpa.mil) and the
  Advanced Research and Development Activity (ARDA;  http://www.ic-arda.org),
  a friendly, outgoing representative of the "intelligence community." (Can crocodiles
  really smile?) Together they sponsor a text retrieval conference known as TREC.
  The conference workshops evaluate the efforts of participants to complete difficult
  tests designed to advance text retrieval systems in different problem categories.   Real Questions  Working searchers should have a natural affinity for TREC's approach to improving
  text retrieval. It tests systems the only way they should be tested. It takes
  real questions, even some supplied by virtual reference operations, and real
  problems, like detecting the novel or finding relevant information in non-English
  documents. It runs the tasks against real datafor example, a million-plus
  collection of full-text articles from major news organizations such as The
  New York Times and the Xinhua news agency wires. TREC even tests systems
  by their ability to admit failurein other words, the answer sought did
  not exist in the information set. One year they even required systems to assign
  confidence values to the answers detected. TREC tracks also deal with scalability
  factors in solutions.   In the question-answering category, questions can extend from "factoids," like
  the name of the river called the "Big Muddy," to layered questions, like a
  list of chewing gum manufacturers. Other areas of investigation involve tasks
  that require judgment, like evaluating which articles take a different stance
  on an issue or what specific information new articles contribute to a breaking
  story. From the very first conference in 1992, TREC (known as Tipster back
  then) focused on answers found rather than documents retrieved. It also called
  for systems that would work beyond the English language or answer spoken questions.
  Humans assess the success of the automated retrieval processes and hold to
  strict standards. Other elements of the search process also receive careful
  evaluation. Search systems may even get pop quizzes on questions that have
  appeared in previous years' conferences.   Participants in the testing mainly come from universities (well over half,
  according to Ellen Voorhees, TREC project manager at NIST), but commercial
  operations (such as Microsoft) can also participate. In any case, as anyone
  saving pennies to buy into Google's IPO knows, universities very often nurture
  the future talent that can take the information industry by storm. A quick
  look at the TREC conference questions and the standards of success imposed
  would convince any information professional that a winner (or even a placer
  or show-er) would be a player to watch in the future.   The Novelty Issue  TREC workshops build around tracks representing specific problem areas. Results
  from tracks may differ from year to year. Current tracks cover cross-language
  issues; filtering; genomics as a specific domain; HARD (High Accuracy Retrieval
  from Documents); interactive or user transaction issues; novelty or how to
  locate new, previously unfound information; question answering; robust retrieval,
  terabyte, or scaling to larger document collections. The novelty issue also
  receives attention from related but non-TREC research conducted by DARPA called
  Topic Detection and Tracking. TREC recently added a video track focusing on
  content retrieval in digital video, which should expand into a general multimedia
  track. TREC also has a Web track that works with a snapshot of the Web as a
  document set for search engines.   Voorhees discussed the conference's role in advancing text retrieval services.
  She pointed to participation in TREC as a grounding for future start-ups coming
  out of academic settings. Most of the corpus that TREC uses for its testing
  comes from newspapers or news wiresoften contributed at no chargeand
  government documents. They have no direct plans for following the scholarly
  communication field (for example, collections of "open access" scholarship)
  due to the difficulties of locating the talent required to evaluate success
  rigorously. However, the new genomics track established last year does tap
  into the National Library of Medicine's PubMed collection of text. In this
  area, an NSF grant helps fund the judging process.   One scholar of the search field pointed to TREC's developmentsor lack
  of developmentover the last 4 or 5 years as evidence that improvements
  in search have hit a "glass ceiling." It has become harder and harder to crack
  the final steps to complete answer extraction. As computerized retrieval improves,
  more and more participants reach the level of previous peak performers, but
  none of the performers seem to move beyond the point where almost all are clustered
  now.   Voorhees admitted that success in the question-answering category has leveled
  off in the last few years in the area of traditional ad hoc tests, i.e., new
  queries seeking document-based answers. In this area, top scores have shown
  little improvement. Some of the failure she attributes to the diversion of
  effort to the research developments needed to meet new tasks and tracks generated
  at TREC. However, she also admitted that nobody yet has appeared at TREC with
  the brilliant insights that will take us to the next state of the art. Natural
  language processing, according to Voorhees, is a hard problem to beat, especially
  if you insist it operate as effectively as a Mr. Spock computer dialog. The
  challenge involves teaching machines the tremendous world knowledge and basic
  understanding built into human intelligence. Nonetheless she is hopeful and,
  when such brilliant breakthroughs finally occur, she expects that some of the
  first glimmers will appear at TREC.   Major Progress  In areas outside question-answering, Voorhees has seen major progress. For
  example, cross-language retrieval has become reliable for major languages,
  not something one could have said 10 years ago. Speech retrieval has also made
  significant progress. In fact, Voorhees believes it has reached the point where
  it can become usable in large-scale services.   Oddly, Web search engines, such as Google or Yahoo! Search, do not participate
  in TREC, although Voorhees assures us that they do follow TREC closely. She
  believes that the Web search engines often focus on different problems, such
  as spidering and large database management issues. However, with the rising
  interest in "answer products," as outlined in Microsoft's new anti-Google strategizing
  and demanded by the small screens of wireless technology, it would seem that
  TREC's approach would have more interest now than ever before. Voorhees said
  she would not be surprised if the fact that all TREC research must be published
  openly might not affect the policy of nonparticipation. However, the research
  arm of Microsoft has participated in TREC in the past.   Speaking of proprietary interests, we asked Voorhees about the role of the
  intelligence community, particularly ARDA, the latest TREC sponsor. Both ARDA
  and DARPA primarily support TREC through monetary contributions, Voorhees said.
  She admitted that the intelligence community was undoubtedly doing its own
  research, designed to answer its own very real needs, and that such research
  might have a while to wait before it saw the light of day.   Lack of Publicity  In Internet time, 10 years counts as a century, at least. So, from one century
  to the next, TREC conferences have sought to find and promote the creation
  of text retrieval systems that do what real people want done and not just shove
  masses of text at people based on a "there's a pony in there somewhere" assumption.
  The conferences have held developers to the grindstone of the state of the
  art, not just the acceptable state of the market. They have helped developers
  meet and share experiences with other developers in an experts-only, shirtsleeve
  working environment. The only defect I can see in their strategy is one seen
  all too often by students of government information science and technology
  projects: lack of publicity.   Well, that stops now. If you want to look at the conferences, you will find
  a complete listing at http://trec.nist.gov/pubs.html. Most of the proceedings
  are available for downloading. The 2004 TREC conference will be held this November
  at NIST in Gaithersburg, Md., but it is only open to participants. However,
  in February 2005, NIST will publish the proceedings on the Web site. And this
  reporter/editor, for one, expects to produce copy on the 2004 conference as
  soon as humanly possible.   To hell with the bushellet's see the light.   
 Barbara Quint is editor of Searcher magazine. Her e-mail address is
 bquint@mindspring.com.
 
 |