Searcher
Vol. 9 No. 4 — April 2001
• FEATURE •
The Dotcom Directory: A Work in Progress
by Cecilia M. PrestonPreston & Lynch
Table of Contents Previous Issues Subscribe Now! ITI Home
Sun Microsystems' advertising claims to have put the @ in dot-com. The Oxford English Dictionary has added "dot-com" to its lexicon. It is no surprise then that there is a Dotcom Directory [http://www.dotcomdirectory.com or http://www.dotcom.com]. It is loosely based on the Network Solutions [http://www.networksolutions.com] database of registered domain names covering the dot-com, dot-org, and dot-net generic top-level domain (gTLD) categories. Recently, the Dotcom Directory was folded into the dotcom.com Web site with its articles and statistics on the Internet.

Research for this article began some 4 months ago. One of the greatest obstacles to completing the writing task was continuing changes at the site. At one point, mid-week in November between 1:30 PM PST and 3 PM, two new buttons appeared on the search dialog box. Neither of the buttons at that time connected with any data. These links went live sometime during the December holidays. Some are quite current, e.g., today's stock closing price. Skepticism over which "today" was today was relieved after checking another site and finding closing price and volume numbers that matched.

So how did this portal come into being? And what can it do for the professional searcher?
 

Background — Network Solutions
Network Solutions was founded in 1979 and a few years later purchased by Science Applications Inc. (SAI). An early contract with the National Science Foundation [http://www.nsf.gov] made it a registrar for the .com, .net, .org, and .edu namespace. As the U.S. government divested itself from the underwriting of much of the infrastructure of the Internet in the late 1990s, Network Solutions became the sole register of the .com, .net, and .org domain names. Network Solutions maintained the stability of the registration process while the privatization of other aspects of the Internet infrastructure was worked out. The most notable of these efforts was the creation of the Internet Committee on Assigning Names and Numbers (ICANN) [http://www.icann.org]. In June 2000, Verisign acquired Network Solutions.

The need for a database of registered domain names was quickly recognized shortly after the codification of network protocols. The earliest effort, RFC 742 Name/Finger, December 30, 1977, described a protocol which would "return a friendly, human-oriented status report on either the system at the moment or a particular person in depth." The WHOIS database was built to accommodate this and remains the foundation on which dotcom.com is built.

With the growth of institutions using the Internet, the tracking of domain names and the names of individuals responsible for dealing with an institution's domain became more complicated. RFC 812 ICNAME/WHOIS was published in March 1, 1982. This protocol moved the location of information about a domain to a "query/response server" running at SRI which would provide a "netwide directory service to ARPANET users. . . . The server is accessible across the ARPANET from user programs running on local hosts, and it delivers the full name, U.S. mailing address, telephone number, and network mailbox for ARPANET users." Although the preferred name for this protocol was NICNAME, it is noted that "some sites may choose to use the more familiar name of 'WHOIS.'" (ARPANET was the name of the original networking system set up by the Defense Department and other federal agencies that led to the creation of the Internet.)

In October 1985, RFC 954 replaced RFC 812. Besides updating the protocols that the server was running, RFC 812 states, 'This server [at SRI], together with the corresponding WHOIS Database can also deliver online look-up of individuals or their online mailboxes, network organizations, DDN nodes and associated hosts, and TAC telephone numbers. DCA request that each individual with a directory on an ARPANET or MILNET host, who is capable of passing traffic across the DoD Internet, be registered in the NIC WHOIS Database." This basic information allowed users to identify an ARPANET user in 1985.

Paul Mockapetris wrote a number of RFCs from 1985 to 1987 that specified elements of the domain name services that remain in current use, including the dot-com. In Keith Lynch's history of the Internet, the timeline [http://keithlynch.net/timeline.html] dates the first citing of a dot-com as occurring in January 1985.

By 1995, the majority of the registered domain names were for dot-coms. The explosion of registered domain names after the Internet was opened to all types of traffic provided Network Solutions with a large store of valuable contact information.

The Internet Software Consortium [http://www.isc.org/ds/] estimates that as of July 2000, there were a total of over 117 million .com, .net, and .org top-level domain (TLD) hosts, up from 2.245 million just 5 years before. The Matrix.net [http://www.matrix.net/] estimates that there were 100 million dot-com hosts as of November 2000. These numbers grow exponentially as one goes down the domain name tree. For example, the largest second-level domain name is "lucent.com," with 7,045,706 third-level domain names branches; branching from "outland.lucent.com" alone reaches 7,045,642 fourth-level names. Only second-level domain names are searchable.
 

The Directory
Network Solutions and InfoSpace [http://infospace.com] announced a marketing agreement in June of 1999. The press release stated the "upcoming Dotcom Directory . . . is designed as a definitive 'find engine' allowing users to quickly locate, research and do business with companies on or off the Web." Building on the WHOIS database, Network Solutions partnered with InfoUSA to provide additional data of relevance to the business information market.

Initially, getting listed in this directory was quite simple. If Network Solutions had registered the domain name and "a business is in the InfoUSA database of over 11 million U.S. and Canadian businesses or is not included in their consumer database," in you went, with or without your consent.

Much unrest was caused by the announcement of the directory. The Commerce Department was concerned about what it saw as the private use of public data. Network Solutions saw the data in the directory as proprietary information generated from its client files, not the shared (WHOIS) database. Besides that, some businesses wanted to be listed that had registered their domain names with one of the newly formed registrars, not Network Solutions. Finally, there were firms that by virtue of registering with Network Solutions went into the directory but wanted out.

The Commerce Department and competing registrars have insured continued access to the zone files and the WHOIS data that Network Solutions administers. These files provide the data from which a domain name look-up occurs. Upon registration, the information on primary and secondary name servers goes into a zone file for the top-level domain of the registrant. Andrew Pincus of the Commerce Department is quoted as saying that he did not object to the directory service, but to what appeared to be a restrictive policy that "effectively insulate[s] the 'Dot Com Directory' against any real competition."
 

Getting into Or out of the Dotcom Directory
The "get listed" page of the Dotcom Directory announces that "It's free!" for qualifying businesses. As discussed above, a qualifying business is one that registered with Network Solutions. But, what if you had registered elsewhere? In that case, you could check your domain name on the "update your listing" page. If the domain name had been registered by a Network Solutions affiliate, the response should be, "Web address not found." Then, "if qualified," you could fill in an update request form. Your listing should become available in about 2 weeks. If by chance the domain name is registered elsewhere, there is no provision for inclusion in the directory at this time, according to a project manager.

To update a record, the owner of a domain name clicks on the "update" button in the Dotcom Directory box on the dotcom.com home page. This will take you to a page that prompts you to enter the domain name you want updated. The data on file for that domain name appears with adjoining boxes in which to enter the new data. An additional set of fields covers contact information. Previously, all requests for listing changes were verified by telephone and the listings updated within 45 days. A new listing management system should reduce the update turn-around time to 2 weeks.

To remove an entire listing, complete the contact information and click the "Remove My Listing" button. The same verification process is used.
 

Searching in the Directory
In the world of the print directory, the concept of accuracy as of some date is well established, but for an Internet directory, users expect currency. Quality issues for a more-or-less self-reporting directory are substantial.

The accuracy and timeliness of information in a listing appears to rest primarily on the company being listed. For example, one company that changed its name over a year ago was still listed under its previous name. When asked about this, the party responsible for "dealing with Network Solutions" simply could not see the importance of keeping up the Network Solutions directory if that information did not impact the registration of domain names. Another sentiment heard when discussing this with colleagues was that past experience of trying to get changes through correctly did not endear those responsible for domain names to "help Network Solutions sell advertising." The WHOIS data, although drawn upon to validate a company's eligibility for listing in the directory, remains a separate file. Updating the base data in the WHOIS database will not update the same fields in the Dotcom Directory.

The number of company name changes, mergers and acquisitions, address changes, and companies that fold is huge. It is this very aspect of searching for current company information that gives this directory such promise.
 

Working with the Directory
Remember, this directory seems to always be under construction. The discussion below is how things worked as of the end of January 2001.

There are four keys to locating company information: Company Name, Web Address, Business Type, and Ticker Symbol.

Search by Business Name
What do "Raphael and Associates," "Raphael & Associates," and "Raphael and Assoc" have in common? Simple. They were not equivalent to "Raphael & Assoc" as of October in the directory.

But by January, "Raphael and Assoc." equaled "Raphael and Associates." In other words, if a searcher entered "Assoc." in a search statement, results reading "Associates" were returned. However, an "&" will still not return the "and" string.

In October, company names were normalized on a case-by-case basis. For example, a search on "ATT" as a company name resulted in more than 300 hits. The first 61 hits showed the form "AT&T" or "ATT" interspersed with each other. "AT&T Corp., 32 Avenue of the Americas, New York, NY" was the 22nd record displayed. A search on "AT&T" resulted in the same list for at least the first 65 records. In presentation, results sorted strictly on the company name string. If duplicates of that string occur, records display as they are retrieved from the database. The plans to implement a second level ordering based on physical addresses became available in January.

Furthermore, there was no consistency in the use of abbreviations or punctuation in the company name that could enable a searcher to learn from previous experience with the database.

Another search for "D&B" in October resulted in another example of result set ordering that could drive many searchers crazy. It appears that in some instances a space constitutes punctuation and may or may not be removed prior to sorting. From the first screen of results from the search "D&B" as a company name came the following versions of a company name:

D B
D & B Accessories
D B Acoustics
D & B Agro-Systems
DB Alan
D & B Alarms
D. B. Anderson Technologies
. . .

When last tried, a search on "ATT" retrieved only those instances of company name beginning with "ATT" with "ATTA Corp." following "ATT" in the display of results. A separate search for "AT&T" results in yet another 176 records without the company associated with the domain name "att.com" appearing within the first 40 records. Searching for "Apple Computer of Cupertino, CA" resulted in two pages of records with the last one representing the link to "apple.com."

When rerun in January, the "D & B" search returned a result set which clearly matched the search string. Thus, "Dun & Bradstreet" did not return the same results as "Dun and Bradstreet." One can only hope that Network Solutions' programmers are already writing the piece of code that will see a search with an "&" as equivalent to "and" and then sort the output with the headquarters location at the top of the list.

The Dotcom Directory uses stop words and automatic right truncation in the corporate name searches, according to Mike Cornell, Product Manager for the Dotcom Directory. He also indicated that there were a number of additional features being worked on to assist with name searches, such as left truncation and other wild card options.

Search By Web Address
The button Web Address searching only targets second-level domain names. For example, a search on "lucent.com" results in a single record for Lucent Technologies headquarters. If one searches on a third-level domain string such as "outland.lucent.com," the system responds "search string id not found" and then asks if you want to "search WHOIS" or start a "new search."

Search by Business Type
The Business Type button allows for the searching of categories. These categories are based on six-digit SIC codes and are assigned by the listing company. Network Solutions has no current plans to move to the North American Industry Classification System (NAICS). If the time comes when SIC codes no longer work for this purpose, Network Solutions will develop a mapping scheme. It seems that many of the dot-coms that this directory covers would be better served by the granularity of the new system.

Search by Ticker Symbol
Since the Dotcom Directory policies require a company to be "qualified" for entry and allows companies to opt out by filling in online forms, it should hardly come as a surprise that some ticker symbols do not yield results. The inclusion of a return page that indicates that the symbol is valid but the company is not listed in the database would certainly help searchers.
 

Extra Results
Once one locates a company, the Dotcom Directory carries links to additional sources of information. InfoUSA [http://infousa.com] provides some of the information for these data elements. These include links:

  • Map & directions
  • Domain name record
  • Buy a credit report
  • A URL
  • More info
  • Company overview
  • Business classification
  • Ticker info (if available)
  • Business wire (if available)
Other information available on the dotcom home page includes:
  • Dot-com features
  • News articles
  • Internet news
  • Browse business categories
  • An Internet cartoon
  • Dot-com statistics
  • Dot-com market watch
  • U.S. map interface to state rankings in the dot-com economy
  • Stock look-up
  • A question of the week
The dotcom.com home page also offers:
  • News and Features: Dot Com Story, Article Library, This Month's Features
  • Facts and Stats: Quick Stats, Fun Facts, U.S. Market, International Market
  • Profiles and Trends: Business Market — Click/Brick & Mortar, Fortune 1000, Consumer Market, Vertical Market
  • Services: Research, Industry Profiles
  • Dot Com Humor
  • Dot Com Events
  • Access to a monthly newsletter


Conclusion
In October, when we looked at the Dotcom Directory, the system seemed to trade off accuracy for timeliness. Given the database from which the list of companies is drawn, there is a great potential for the Dotcom Directory to become a valuable search tool. Anyone who has gathered business information from multiple sources knows of the effort required to create consistent and coherent output.

The people working on the Dotcom Directory have made many positive strides towards the reliability and currency of the content in the last 4 months. But like all projects of this size, there is always more work to do.
 

Bibliography
Harrenstien, K. Name/Finger. NWG/RFC# 742. December 30, 1977. Available online via the IETF Web site.
Harrenstien, Ken and Vic White. NICNAME/WHOIS RFC812. 1 March 1982. Available online via the IETF Web site.
Harrenstien, K., Stahl, M., and Feinler, E. NICNAME/WHOIS RFC954 (Obsoletes: RFC812). October 1985.
 
 

Cecilia M. Preston's e-mail address is  cecilia@well.com.

Table of Contents Previous Issues Subscribe Now! ITI Home
© 2001