As an Internet user, you know it would be difficult to
overestimate the impact of information technology on our
lives. In his book InfoCulture, museum curator
Steven Lubar says our information machines, “and the social
structures that they are part of, have come to define
our culture, at least as much as ethnicity, race, or geography.
How we feel about the world around us, about one another,
even about ourselves has been changed by these machines
and the way we’ve chosen to use them.” Of course, two
of the primary technologies that have had a profound
impact on how we experience the world are radio and
television, yet no definitive archives of those media
exist. Yes, there are small collections of programs
and a few archives of historical footage, but much of
the early days of those media has been lost forever.
The Internet Archive is an organization and a Web site
working to ensure the same thing does not happen to
our digital media.
The Wayback Machine
Located in the Presidio of San Francisco, the Internet
Archive was founded as a nonprofit organization in 1996.
Its mission is “to build an ‘Internet library,’ with
the purpose of offering permanent access for researchers,
historians, and scholars to historical collections that
exist in digital format.”
“Libraries exist to preserve society’s cultural artifacts
and to provide access to them,” says a note on the archive’s
site [www.archive.org].
“If libraries are to continue to foster education and
scholarship in this era of digital technology, it’s
essential for them to extend those functions into the
digital world.”
The note adds that “without cultural artifacts, civilization
has no memory and no mechanism to learn from its successes
and failures.” Collaborating with such institutions
as the Library of Congress and the Smithsonian, the
archive is striving to preserve those artifacts for
future generations.
According to Brewster Kahle, director of the archive,
the average life span of a Web page is about 100 days.
But the archive’s Web site includes a Wayback Machine
that lets you view pages dating from 1996. Do you want
to know what the front page of Yahoo! looked like on
February 9, 1997? Just enter www.yahoo.com in the Wayback
Machine’s search box on the site’s home page. Want to
see Amazon.com from December 12, 1998? Just enter its
URL. An advanced search interface is available, too,
but the site doesn’t offer an indexed text search of
the documents in the collection. The editors are working
on it, however, and full-text searching may be available
soon.
The archive hasn't recorded every page from the past.
Some sites weren’t included because the archive’s automated
crawlers weren’t aware of them. Some weren’t included
because the sites were password-protected or otherwise
inaccessible. Some were removed because their Web site
administrators asked them to be taken out. You also
should note that the Wayback does not add pages less
than 6 months after they are collected, and in some
cases, updates can take 12 months.
Sill, the Wayback Machine already has archived more
than 10 billion Web pages, which makes it one of the
world’s largest publicly accessible databases. It contains
100 terabytes of data, and it’s growing at a monthly
rate of about 12 terabytes. To put that figure in perspective,
consider that the Library of Congress contains about
20 terabytes of data. The Internet Archive’s FAQ points
out, “If you tried to place the entire contents of the
archive onto floppy disks (we don't recommend this!)
and laid them end to end, it would stretch from New
York, past Los Angeles, and halfway to Hawaii.”
Special collections
Besides the Wayback Machine, the archive offers several
special collections. For example, the Moving Image Collections
includes the Prelinger Archives, which contain more
than 900 digitized industrial, educational, and government
films dating from 1903. You can find, for instance,
amateur films of construction of the Golden Gate Bridge
and of the New York World's Fair of 1939.
The Moving Image Collections also include dozens of
archived episodes of “The Computer Chronicles” and “Net
Café.” There’s a sampling of "Orphan films" from
the Orphan Film symposium at the University of South
Carolina, and you can access the World at War Collection,
which was created through an Internet Archive contest
that challenged people to create short films demonstrating
why access to history matters.
An audio archive, etree.org, is a network of mailing
lists and FTP servers that provide access to high-quality
digital recordings of live music performances. All the
concerts available through the servers are performances
by musicians and bands that allow noncommercial recording
and distribution of their live concerts.
The archive’s Text Collections page provides access
to such electronic text projects as The International
Children’s Digital Library, Project Gutenberg, Arpanet,
and the Million Book Project.
The Internet Archive also is collaborating with Macromedia
to make thousands of software titles available for remote
execution.
A September 11 archive [http://web.archive.org/collections/sep11.html]
includes thousands of Web pages from news organizations,
government and military agencies, and charitable organizations.
An Election 2000 collection [http://web.archive.org/collections/e2k.html]
contains 800 gigabytes of relevant data gathered from
August 1, 2000, to January 21, 2001. You can see, for
example, how www.georgebush.com looked on Election Day,
Tuesday, November 7, 2000.
Practical purposes
Now you might be thinking, “Well, I’m
glad somebody is archiving the Internet, and it’s nice
that I could see what Yahoo! looked like in February
of 1997, but why would I want to?”
There are a number of practical purposes for the archived
pages. In an article in ONLINE magazine (“The
Wayback Machine: The Web's Archive,” March/April 2002),
Greg R. Notess, reference librarian at Montana State
University, points out several possible uses: “Patent
searchers can verify prior art. Business experts can
look up failed companies' business plans. Employers
can investigate job applicants' student Web pages. Sources
lost because of complex URL shifting can be found by
their old URL on the Wayback Machine.”
Notess also points out that “the ability to view a
range of versions of a particular page, and to browse
the archived site itself, offers a range of uses. A
new Web designer can look at previous incarnations of
a site, even if the organization itself never archived
the various versions. A new business can look at their
competitors' early designs and avoid the same mistakes.
And the researcher who is trying to track down the online
resources from the bibliography of a 4-year-old paper
can find them in the archive, even if they have otherwise
vanished from the current Web.”
(Notess’s article also contains excellent information
on how to search the archive. You can read the article
on the Web at https://www.infotoday.com/online/mar02/OnTheNet.htm.)
The Internet Archive also could be used to explore
the role information technology is playing in our lives.
Bernardo Huberman of the Xerox Palo Alto Research Center
has pointed out that “researchers could use the Archive’s
Web snapshots in combination with usage statistics to
compare how people in different countries use the Web
over long periods of time.... Political scientists and
sociologists could use the data to study how public
opinion gets formed. For example, suppose a device for
increasing privacy became available: Would it change
usage patterns?"
Answering such questions will be increasingly important
as the digital information revolution continues. New
technologies will arrive, and each will have the potential
to enhance or diminish society. We continually will
need to assess the impact they have had on us and on
our view of the world. Online libraries such as the
Internet Archive will be able to help us with the task
for generations to come. As the site says, “Internet
libraries can change the content of the Internet from
ephemera to enduring artifacts of our political and
cultural lives.”
Thomas Pack [ThomasPack@aol.com]
is a freelance writer who lives near Louisville, Kentucky.
|