The NEC Research Institute, Inc. reports that, despite the Internet's
decentralized and seemingly unorganized nature, NEC scientists have discovered
that the Web is in fact naturally self-organizing. According to the announcement,
this discovery led the scientists to develop an algorithm that may change
the way companies segment and target specific online audiences.
The complete findings of the NEC Research Institute are published in
the March 2002 issue of the IEEE Computer Society's Computer magazine.
The scientists' research shows that the Internet's structure of "clickable"
links within Web pages allows for the identification of communities based
on specific topics of interest. These communities are considered to be
natural in that independently authored pages collectively organize them.
This research is particularly significant given the fact that no central
authority or process governs the formation and structure of Web pages and
links.
Once affirming the Internet's self-organizing properties, Gary W. Flake
and Steve Lawrence, research scientists at the NEC Research Institute,
teamed with C. Lee Giles,a professor at Penn State University, and Frans
M. Coetzee, chief technical officer of GenuOne, to develop the community
algorithm. This algorithm enables businesses and individuals to zero in
on specific information by focusing on communities of Web pages that are
related to one another.
For example, an individual wishing to study the latest scientific findings
on breast cancer research is able to locate medical literature, treatments,
and new developments without wading through the pages of irrelevant material
that a normal Internet search on the subject might produce. According to
the announcement, this is possible because NEC's algorithm utilizes link
information to generate its results, rather than specific text that may
appear on countless Web pages.
"We have found that a Web author's creation of a specific Web link is
a stronger indication of relevance than the implied relevancy generated
by simple textual phrase and structure matching," said Flake. "Additionally,
separating link structure from content facilitates using content-based
similarity measures to independently validate the relevancy of the results
that our algorithm produces."
In addition to enabling companies to more effectively target key audiences,
NEC believes that other applications of its methodology include improved
search engines, content filtering, objective analysis of Internet content,
and relationships between Web communities.
"Our process holds the potential for the development of specialized
search engines capable of identifying only pages within their domains,"
said Flake. "In addition, this development may lead to the creation of
Web filtering software that identifies certain communities of pages to
be filtered for either relevant or undesirable content."
The community algorithm takes a set of base Web sites as input and identifies
a larger community of Web pages that contain them. NEC researchers define
a Web community as a collection of Web pages that have more links within
the community than outside of the community. Thus, each member of the identified
community will typically be focused on a single topic regardless of textual
ambiguities.
Source: NEC Research Institute, Princeton, NJ, 609/520-1555; http://www.neci.nj.nec.com. |