Sun
Microsystems' advertising claims to have put the @ in dot-com. The Oxford
English Dictionary has added "dot-com" to its lexicon. It is no surprise
then that there is a Dotcom Directory [http://www.dotcomdirectory.com
or http://www.dotcom.com]. It
is loosely based on the Network Solutions [http://www.networksolutions.com]
database of registered domain names covering the dot-com, dot-org, and
dot-net generic top-level domain (gTLD) categories. Recently, the Dotcom
Directory was folded into the dotcom.com Web site with its articles and
statistics on the Internet.
Research for this
article began some 4 months ago. One of the greatest obstacles to completing
the writing task was continuing changes at the site. At one point, mid-week
in November between 1:30 PM PST and 3 PM, two new buttons appeared on the
search dialog box. Neither of the buttons at that time connected with any
data. These links went live sometime during the December holidays. Some
are quite current, e.g., today's stock closing price. Skepticism over which
"today" was today was relieved after checking another site and finding
closing price and volume numbers that matched.
So how did this
portal come into being? And what can it do for the professional searcher?
Background — Network Solutions
Network Solutions
was founded in 1979 and a few years later purchased by Science Applications
Inc. (SAI). An early contract with the National Science Foundation [http://www.nsf.gov]
made it a registrar for the .com, .net, .org, and
.edu
namespace. As the U.S. government divested itself from the underwriting
of much of the infrastructure of the Internet in the late 1990s, Network
Solutions became the sole register of the .com, .net, and
.org
domain names. Network Solutions maintained the stability of the registration
process while the privatization of other aspects of the Internet infrastructure
was worked out. The most notable of these efforts was the creation of the
Internet Committee on Assigning Names and Numbers (ICANN) [http://www.icann.org].
In June 2000, Verisign acquired Network Solutions.
The need for a
database of registered domain names was quickly recognized shortly after
the codification of network protocols. The earliest effort, RFC 742 Name/Finger,
December 30, 1977, described a protocol which would "return a friendly,
human-oriented status report on either the system at the moment or a particular
person in depth." The WHOIS database was built to accommodate this and
remains the foundation on which dotcom.com is built.
With the growth
of institutions using the Internet, the tracking of domain names and the
names of individuals responsible for dealing with an institution's domain
became more complicated. RFC 812 ICNAME/WHOIS was published in March 1,
1982. This protocol moved the location of information about a domain to
a "query/response server" running at SRI which would provide a "netwide
directory service to ARPANET users. . . . The server is accessible across
the ARPANET from user programs running on local hosts, and it delivers
the full name, U.S. mailing address, telephone number, and network mailbox
for ARPANET users." Although the preferred name for this protocol was NICNAME,
it is noted that "some sites may choose to use the more familiar name of
'WHOIS.'" (ARPANET was the name of the original networking system set up
by the Defense Department and other federal agencies that led to the creation
of the Internet.)
In October 1985,
RFC 954 replaced RFC 812. Besides updating the protocols that the server
was running, RFC 812 states, 'This server [at SRI], together with the corresponding
WHOIS Database can also deliver online look-up of individuals or their
online mailboxes, network organizations, DDN nodes and associated hosts,
and TAC telephone numbers. DCA request that each individual with a directory
on an ARPANET or MILNET host, who is capable of passing traffic across
the DoD Internet, be registered in the NIC WHOIS Database." This basic
information allowed users to identify an ARPANET user in 1985.
Paul Mockapetris
wrote a number of RFCs from 1985 to 1987 that specified elements of the
domain name services that remain in current use, including the dot-com.
In Keith Lynch's history of the Internet, the timeline [http://keithlynch.net/timeline.html]
dates the first citing of a dot-com as occurring in January 1985.
By 1995, the majority
of the registered domain names were for dot-coms. The explosion of registered
domain names after the Internet was opened to all types of traffic provided
Network Solutions with a large store of valuable contact information.
The Internet Software
Consortium [http://www.isc.org/ds/]
estimates that as of July 2000, there were a total of over 117 million
.com,
.net,
and .org top-level domain (TLD) hosts, up from 2.245 million just
5 years before. The Matrix.net [http://www.matrix.net/] estimates that
there were 100 million dot-com hosts as of November 2000. These numbers
grow exponentially as one goes down the domain name tree. For example,
the largest second-level domain name is "lucent.com," with 7,045,706 third-level
domain names branches; branching from "outland.lucent.com" alone reaches
7,045,642 fourth-level names. Only second-level domain names are searchable.
The Directory
Network Solutions
and InfoSpace [http://infospace.com]
announced a marketing agreement in June of 1999. The press release stated
the "upcoming Dotcom Directory . . . is designed as a definitive 'find
engine' allowing users to quickly locate, research and do business with
companies on or off the Web." Building on the WHOIS database, Network Solutions
partnered with InfoUSA to provide additional data of relevance to the business
information market.
Initially, getting
listed in this directory was quite simple. If Network Solutions had registered
the domain name and "a business is in the InfoUSA database of over 11 million
U.S. and Canadian businesses or is not included in their consumer database,"
in you went, with or without your consent.
Much unrest was
caused by the announcement of the directory. The Commerce Department was
concerned about what it saw as the private use of public data. Network
Solutions saw the data in the directory as proprietary information generated
from its client files, not the shared (WHOIS) database. Besides that, some
businesses wanted to be listed that had registered their domain names with
one of the newly formed registrars, not Network Solutions. Finally, there
were firms that by virtue of registering with Network Solutions went into
the directory but wanted out.
The Commerce Department
and competing registrars have insured continued access to the zone files
and the WHOIS data that Network Solutions administers. These files provide
the data from which a domain name look-up occurs. Upon registration, the
information on primary and secondary name servers goes into a zone file
for the top-level domain of the registrant. Andrew Pincus of the Commerce
Department is quoted as saying that he did not object to the directory
service, but to what appeared to be a restrictive policy that "effectively
insulate[s] the 'Dot Com Directory' against any real competition."
Getting into Or out of the
Dotcom Directory
The "get listed"
page of the Dotcom Directory announces that "It's free!" for qualifying
businesses. As discussed above, a qualifying business is one that registered
with Network Solutions. But, what if you had registered elsewhere? In that
case, you could check your domain name on the "update your listing" page.
If the domain name had been registered by a Network Solutions affiliate,
the response should be, "Web address not found." Then, "if qualified,"
you could fill in an update request form. Your listing should become available
in about 2 weeks. If by chance the domain name is registered elsewhere,
there is no provision for inclusion in the directory at this time, according
to a project manager.
To update a record,
the owner of a domain name clicks on the "update" button in the Dotcom
Directory box on the dotcom.com home page. This will take you to a page
that prompts you to enter the domain name you want updated. The data on
file for that domain name appears with adjoining boxes in which to enter
the new data. An additional set of fields covers contact information. Previously,
all requests for listing changes were verified by telephone and the listings
updated within 45 days. A new listing management system should reduce the
update turn-around time to 2 weeks.
To remove an entire
listing, complete the contact information and click the "Remove My Listing"
button. The same verification process is used.
Searching in the Directory
In the world of
the print directory, the concept of accuracy as of some date is well established,
but for an Internet directory, users expect currency. Quality issues for
a more-or-less self-reporting directory are substantial.
The accuracy and
timeliness of information in a listing appears to rest primarily on the
company being listed. For example, one company that changed its name over
a year ago was still listed under its previous name. When asked about this,
the party responsible for "dealing with Network Solutions" simply could
not see the importance of keeping up the Network Solutions directory if
that information did not impact the registration of domain names. Another
sentiment heard when discussing this with colleagues was that past experience
of trying to get changes through correctly did not endear those responsible
for domain names to "help Network Solutions sell advertising." The WHOIS
data, although drawn upon to validate a company's eligibility for listing
in the directory, remains a separate file. Updating the base data in the
WHOIS database will not update the same fields in the Dotcom Directory.
The number of company
name changes, mergers and acquisitions, address changes, and companies
that fold is huge. It is this very aspect of searching for current company
information that gives this directory such promise.
Working with the Directory
Remember, this
directory seems to always be under construction. The discussion below is
how things worked as of the end of January 2001.
There are four
keys to locating company information: Company Name, Web Address, Business
Type, and Ticker Symbol.
Search by
Business Name
What do "Raphael
and Associates," "Raphael & Associates," and "Raphael and Assoc" have
in common? Simple. They were not equivalent to "Raphael & Assoc" as
of October in the directory.
But by January,
"Raphael and Assoc." equaled "Raphael and Associates." In other words,
if a searcher entered "Assoc." in a search statement, results reading "Associates"
were returned. However, an "&" will still not return the "and" string.
In October, company
names were normalized on a case-by-case basis. For example, a search on
"ATT" as a company name resulted in more than 300 hits. The first 61 hits
showed the form "AT&T" or "ATT" interspersed with each other. "AT&T
Corp., 32 Avenue of the Americas, New York, NY" was the 22nd record displayed.
A search on "AT&T" resulted in the same list for at least the first
65 records. In presentation, results sorted strictly on the company name
string. If duplicates of that string occur, records display as they are
retrieved from the database. The plans to implement a second level ordering
based on physical addresses became available in January.
Furthermore, there
was no consistency in the use of abbreviations or punctuation in the company
name that could enable a searcher to learn from previous experience with
the database.
Another search
for "D&B" in October resulted in another example of result set ordering
that could drive many searchers crazy. It appears that in some instances
a space constitutes punctuation and may or may not be removed prior to
sorting. From the first screen of results from the search "D&B" as
a company name came the following versions of a company name:
D B
D & B Accessories
D B Acoustics
D & B Agro-Systems
DB Alan
D & B Alarms
D. B. Anderson
Technologies
. . .
When last tried,
a search on "ATT" retrieved only those instances of company name beginning
with "ATT" with "ATTA Corp." following "ATT" in the display of results.
A separate search for "AT&T" results in yet another 176 records without
the company associated with the domain name "att.com" appearing within
the first 40 records. Searching for "Apple Computer of Cupertino, CA" resulted
in two pages of records with the last one representing the link to "apple.com."
When rerun in January,
the "D & B" search returned a result set which clearly matched the
search string. Thus, "Dun & Bradstreet" did not return the same results
as "Dun and Bradstreet." One can only hope that Network Solutions' programmers
are already writing the piece of code that will see a search with an "&"
as equivalent to "and" and then sort the output with the headquarters location
at the top of the list.
The Dotcom Directory
uses stop words and automatic right truncation in the corporate name searches,
according to Mike Cornell, Product Manager for the Dotcom Directory. He
also indicated that there were a number of additional features being worked
on to assist with name searches, such as left truncation and other wild
card options.
Search By
Web Address
The button Web
Address searching only targets second-level domain names. For example,
a search on "lucent.com" results in a single record for Lucent Technologies
headquarters. If one searches on a third-level domain string such as "outland.lucent.com,"
the system responds "search string id not found" and then asks if you want
to "search WHOIS" or start a "new search."
Search by
Business Type
The Business Type
button allows for the searching of categories. These categories are based
on six-digit SIC codes and are assigned by the listing company. Network
Solutions has no current plans to move to the North American Industry Classification
System (NAICS). If the time comes when SIC codes no longer work for this
purpose, Network Solutions will develop a mapping scheme. It seems that
many of the dot-coms that this directory covers would be better served
by the granularity of the new system.
Search by
Ticker Symbol
Since the Dotcom
Directory policies require a company to be "qualified" for entry and allows
companies to opt out by filling in online forms, it should hardly come
as a surprise that some ticker symbols do not yield results. The inclusion
of a return page that indicates that the symbol is valid but the company
is not listed in the database would certainly help searchers.
Extra Results
Once one locates
a company, the Dotcom Directory carries links to additional sources of
information. InfoUSA [http://infousa.com] provides some of the information
for these data elements. These include links:
-
Map & directions
-
Domain name record
-
Buy a credit report
-
A URL
-
More info
-
Company overview
-
Business classification
-
Ticker info (if available)
-
Business wire (if
available)
Other information
available on the dotcom home page includes:
-
Dot-com features
-
News articles
-
Internet news
-
Browse business categories
-
An Internet cartoon
-
Dot-com statistics
-
Dot-com market watch
-
U.S. map interface
to state rankings in the dot-com economy
-
Stock look-up
-
A question of the
week
The dotcom.com home
page also offers:
-
News and Features:
Dot Com Story, Article Library, This Month's Features
-
Facts and Stats: Quick
Stats, Fun Facts, U.S. Market, International Market
-
Profiles and Trends:
Business Market — Click/Brick & Mortar, Fortune 1000, Consumer Market,
Vertical Market
-
Services: Research,
Industry Profiles
-
Dot Com Humor
-
Dot Com Events
-
Access to a monthly
newsletter
Conclusion
In October, when
we looked at the Dotcom Directory, the system seemed to trade off accuracy
for timeliness. Given the database from which the list of companies is
drawn, there is a great potential for the Dotcom Directory to become a
valuable search tool. Anyone who has gathered business information from
multiple sources knows of the effort required to create consistent and
coherent output.
The people working
on the Dotcom Directory have made many positive strides towards the reliability
and currency of the content in the last 4 months. But like all projects
of this size, there is always more work to do.
Bibliography
Harrenstien, K.
Name/Finger. NWG/RFC# 742. December 30, 1977. Available online via the
IETF Web site.
Harrenstien, Ken
and Vic White. NICNAME/WHOIS RFC812. 1 March 1982. Available online via
the IETF Web site.
Harrenstien, K.,
Stahl, M., and Feinler, E. NICNAME/WHOIS RFC954 (Obsoletes: RFC812). October
1985.
Cecilia M. Preston's
e-mail address is cecilia@well.com.
|