On The Net 
                        Dating the Web: The Confusion of Chronology                         
                        By Greg R. Notess 
                        Reference Librarian Montana
                        State University  
 
                        For those of us grounded in the print world of publishing, 
                        the date of publication helps identify and distinguish 
                        different editions, specific periodical issues, and even 
                        re-printings. The date of print publication rarely matches 
                        the exact date of composition but does have the distinct 
                        advantage of being basically unchangeable. Once published 
                        and a publication date has been included on each item, 
                        the only way authors can update or otherwise make a change 
                        is to issue a new edition or a correction. Otherwise, 
                        they would have to track down every single copy and make 
                        the change.  
                         
                        The Web, of course, is pretty much the opposite. Site 
                        owners can change any page any time they wish. Unscrupulous 
                        but talented hacks sometimes can even change other people's 
                        pages. And anyone who has ever tried to cite a Web page 
                        knows that many have no publication date information listed 
                        at all.   
                        Yet for all its newness as a publication medium, the Web 
                        is aging. We have had public Web sites up for more than 
                        a decade now, though few pages remain in their original 
                        format. As the Web ages, it becomes increasingly important 
                        to try and understand the origination date of certain 
                        Web content. For intellectual property cases and the historical 
                        record, among other reasons, it can be important to know 
                        when a Web page was actually written or first posted. 
                        Exploring the Internet dating scene for the information 
                        professional means understanding the dimensions, deficiencies, 
                        and differences of the various dates associated with Web 
                        pages.
DATING DIMENSIONS
                        With the ease of posting a Web page, which is then 
                          publicly available, and the subsequent ease of changing 
                          that page, issues of date information have several dimensions. 
                          There is the original content creation date and possibly 
                          an editing or updated date. The surrounding text and 
                          graphics may come from an entirely different day and 
                          time while the page design may have occurred at yet 
                          a different point.  
                           
                          The date and time when the file containing this conglomeration 
                          of parts was last changed are reported in the date stamp. 
                          Any time a file on a computer is changed, a new date 
                          and time stamp, based on the computer's internal clock, 
                          is recorded.  
                           
                          Take for example an article written in 1998 that may 
                          have been uploaded to a Web page. The links in it may 
                          have been updated in 2000, while the page was redesigned 
                          with new surrounding logo graphics in 2002. Then the 
                          whole site was redesigned using a new content management 
                          system in 2004, resulting in the date stamp being updated 
                          to report the current year's date. Yet the bulk of the 
                          content of the article is still 6 years old, and the 
                          links have not been updated in 4 years.  
                        DATE DEFICIENCIES 
                        The previous example shows the problems with dating 
                          Web content. Most articles published on the Web by news 
                          media and periodical publishers have fairly obvious 
                          creation dates posted along with the article. Many Gannett 
                          papers include an "originally published" date 
                          and label at the bottom of each article. URLs also include 
                          the year, month, and day of the original publication. 
                           
                           
                          However, other news publications often have no date 
                          listed in the article or in the URL. Still others put 
                          the current day's date on the top of every page, even 
                          when the articles were obviously published earlier. 
                          Alternatively, some list a "posted on" date. 
                          This may or may not be the same date as the date of 
                          the article's newspaper publication.  
                           
                          Beyond articles, plenty of other Web pages include some 
                          kind of a date. Far too often, it is only a small copyright 
                          notice at the bottom of the page. Typically the current 
                          year or a range of years such as 1995-2004 is listed. 
                          The problem with this date statement is that, on many 
                          sites, it may just be part of a standard footer on every 
                          page. Check other pages on the same site to verify the 
                          use of a standard footer. If every page has the same 
                          copyright statement at the bottom, then it is likely 
                          just a site-wide copyright statement.  
                           
                          Many other pages list no date information at all. In 
                          this case, checking for a date stamp on the page may 
                          be helpful. In Internet Explorer (IE), click "File" 
                          on the drop-down menu and then click "Properties." 
                          In Netscape, Mozilla, or Firefox, use "View/Page 
                          Info" or the keyboard shortcut of CTRL+ I. Just 
                          remember that, if accurate, this is only the date the 
                          page was last changed. The actual writing and posting 
                          of the content may have occurred much earlier.  
                        DATING DIFFICULTIES
                        Unfortunately, determining the date of a page can be 
                          even more difficult. The date stamp is not reported 
                          on many pages. For sites which use SSI, ASP, PHP, or 
                          other server-side scripting languages (or use some content 
                          management systems), the date stamp on all the pages 
                          will always be the current day and time.  
                           
                          Even for those pages that do have a date stamp, various 
                          versions and installations of Internet Explorer may 
                          not display the correct date stamp. One solution is 
                          to use a "Show Date" bookmarklet. Simply add 
                          a new "Favorite" in IE with a name such as 
                          "Show Date" and instead of a URL enter javascript:alert 
                          (document.lastModified). This will help if the regular 
                          "File/Properties" approach does not work properly. 
                          As an alternative, just check the page in Netscape, 
                          Mozilla, or Firefox and use the "Page Info" 
                          display.  
                           
                          Tricks like the bookmarklet do not help for pages that 
                          either do not show a date stamp or, more commonly, just 
                          give the current date. Except for pages that are obviously 
                          updated on a daily basis, never trust a date stamp that 
                          matches the current date. Instead, look for other ways 
                          to establish a publication date.  
                        DATE SEARCHING
                         A variety of search options can help identify the 
                          date of some Web pages. Checking in with the search 
                          engines brings up one more date to add to the confusion. 
                          When a search engine sends out its spider to index the 
                          Web page content, it adds the new indexing date. With 
                          so many sites not reporting an actual date stamp, the 
                          last indexing date may be the only date information 
                          that a search engine knows.  
                           
                          A quick search at Gigablast clearly identifies the sites 
                          with a date stamp and those without. Those with a date 
                          stamp have a modified date as well as an indexed date 
                          listed. Those with only the indexed date gave no usable 
                          date stamp to Gigablast's spider. This great date reporting 
                          along with cached copies of pages at Gigablast make 
                          it a good tool in helping to pin down the date of a 
                          page.  
                           
                          The Internet Archive's Wayback Machine [www.archive.org] 
                          is a better place to look. As long as the page is archived 
                          there and is not older than late 1996, looking through 
                          the various versions can help detect the difference 
                          between content change and design changes. While the 
                          early years of the archive are less complete than recent 
                          ones, just seeing when the page first appeared in the 
                          archive can give a good hint as to its creation date. 
                           
                           
                          However, bear in mind that the page's content could 
                          just have been previously published on a page with a 
                          different URL or on a different Web site and could therefore 
                          be older. Look for clues to this in the earliest pages 
                          in the archive. Also check on the site's main URL to 
                          see how far back it is archived and whether it points 
                          to the same content in a different location.  
                           
                          Yahoo! and Google have date limits that can be used 
                          to try to hone in on pages from the last few months. 
                          To find a page from an older time period, try the AltaVista 
                          or AlltheWeb advanced search. Even though they are basically 
                          the Yahoo! database, they have more precise date limits 
                          than Yahoo!. However, given the multiplicity of dates 
                          and their general inaccuracy, do not depend too much 
                          on the accuracy of the search engine date limit.  
                           
                          For more recent changes to content, the search engines' 
                          cached copies can offer some help. With Gigablast, both 
                          the search engine results page and the cached copy include 
                          the last indexed date and the date stamp. Just remember 
                          that the date stamp, listed as the modified date, was 
                          the date stamp at the time that the page was last indexed. 
                          It may have been updated since then. Compare the current 
                          page to the cached version to verify.  
                           
                          With Google's recent addition of its indexing date to 
                          the cached copy of pages, it is much easier to identify 
                          the date of the cached copy. Unfortunately, it only 
                          has one cached copy available per date. To see the date, 
                          click on the "cached" link and then look in 
                          the header for the "as retrieved on" date. 
                          Note that the time is given as Greenwich Mean Time (GMT) 
                          so be sure to convert to the appropriate local time. 
                         
                        DATING DIFFERENCES 
                        Time zones are an important consideration in the global 
                          network. With activity and posting on the Web coming 
                          from all around the world at all times of day and night, 
                          when the actual time of posting is of concern, be careful 
                          in assuming the time zone.  
                           
                          Even Google uses two different time zone standards. 
                          The recent addition of the GMT indexing time in the 
                          cache contrasts with the green date that sometimes appears 
                          on the Google results list. Often known as the "fresh" 
                          date (since that was the old label and the green date 
                          only appears on recently re-indexed pages), it is not 
                          based on GMT. For the times listed in the cache as GMT, 
                          Google seems to be using "fresh" dates based 
                          on U.S. Eastern Standard Time. But with so many search 
                          firms looking into local search, it is also possible 
                          that the displayed time could be tied to the user's 
                          local settings (chosen or guessed) rather than the date 
                          and time at the creator's location.  
                           
                          Location on the Web is also difficult. With Web servers 
                          often located in another part of the world than the 
                          author—a German living in Australia could be posting 
                          new Web pages on his server located in U.S.—which 
                          date and time is displayed? If it is a date stamp based 
                          on the system clock, it would probably be the U.S. time 
                          zone. If it is an author-supplied date that is visible 
                          on the text of the page, then it is more likely based 
                          on the author's time zone.  
                           
                          One nice feature of most Weblog implementations is that 
                          the postings include a posting date and time, depending 
                          on the blog's configuration. The time zone difference 
                          issue still applies, especially for multi-author blogs 
                          where the writers can be very geographically dispersed. 
                          Even for the single author blog, the date displayed 
                          is still at the discretion of the blogger. Some blog 
                          software lets the writer control the posting date, making 
                          it easy to add earlier postings and even future ones. 
                          Editing older posts is also easy. And while a necessary 
                          function for correcting typographical errors, bad links, 
                          and misstatements, editing does pose a problem for those 
                          needing to see the original text of the posting. 
                        DATING WRAP-UP
                         The whole Internet dating scene is quite complex, 
                          whether talking about the online relationship matching 
                          or the chronological side addressed here. The danger 
                          for the unwary information seeker is in falling into 
                          the clutches of a misleadingly dated Web page and drawing 
                          false conclusions about the page's content based on 
                          that one date. When accuracy of chronology is essential, 
                          remember to insist on more than one date verification. 
                         
                         
 Greg
R. Notess (greg@notess.com; www.notess.com)
is a reference librarian at Montana State University and founder of SearchEngineShowdown.com.  
Comments? Email the editor at marydee@infotoday.com.  
   
  |