Information Today
Volume 19, Issue 5 — May 2002
Table of Contents Previous Issues Subscribe Now! ITI Home
NEC Research Institute Reports that the Web Is Self-Organizing

The NEC Research Institute, Inc. reports that, despite the Internet's decentralized and seemingly unorganized nature, NEC scientists have discovered that the Web is in fact naturally self-organizing. According to the announcement, this discovery led the scientists to develop an algorithm that may change the way companies segment and target specific online audiences.

The complete findings of the NEC Research Institute are published in the March 2002 issue of the IEEE Computer Society's Computer magazine.

The scientists' research shows that the Internet's structure of "clickable" links within Web pages allows for the identification of communities based on specific topics of interest. These communities are considered to be natural in that independently authored pages collectively organize them. This research is particularly significant given the fact that no central authority or process governs the formation and structure of Web pages and links.

Once affirming the Internet's self-organizing properties, Gary W. Flake and Steve Lawrence, research scientists at the NEC Research Institute, teamed with C. Lee Giles,a professor at Penn State University, and Frans M. Coetzee, chief technical officer of GenuOne, to develop the community algorithm. This algorithm enables businesses and individuals to zero in on specific information by focusing on communities of Web pages that are related to one another.

For example, an individual wishing to study the latest scientific findings on breast cancer research is able to locate medical literature, treatments, and new developments without wading through the pages of irrelevant material that a normal Internet search on the subject might produce. According to the announcement, this is possible because NEC's algorithm utilizes link information to generate its results, rather than specific text that may appear on countless Web pages.

"We have found that a Web author's creation of a specific Web link is a stronger indication of relevance than the implied relevancy generated by simple textual phrase and structure matching," said Flake. "Additionally, separating link structure from content facilitates using content-based similarity measures to independently validate the relevancy of the results that our algorithm produces."

In addition to enabling companies to more effectively target key audiences, NEC believes that other applications of its methodology include improved search engines, content filtering, objective analysis of Internet content, and relationships between Web communities.

"Our process holds the potential for the development of specialized search engines capable of identifying only pages within their domains," said Flake. "In addition, this development may lead to the creation of Web filtering software that identifies certain communities of pages to be filtered for either relevant or undesirable content."

The community algorithm takes a set of base Web sites as input and identifies a larger community of Web pages that contain them. NEC researchers define a Web community as a collection of Web pages that have more links within the community than outside of the community. Thus, each member of the identified community will typically be focused on a single topic regardless of textual ambiguities.

Source: NEC Research Institute, Princeton, NJ, 609/520-1555; http://www.neci.nj.nec.com.

Table of Contents Previous Issues Subscribe Now! ITI Home
© 2002 Information Today, Inc. Home