Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology DBTA/Unisphere
PRIVACY/COOKIES POLICY
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



Vendors: For commercial reprints in print or digital form, contact LaShawn Fugate (lashawn@infotoday.com).
Magazines > Online Searcher
Back Forward

ONLINE SEARCHER: Information Discovery, Technology, Strategies

HOME

Saving the Web for Posterity
By
March/April 2022 Issue

The Internet Archive (archive.org), founded in 1996, is a vital resource for information professionals for a number of reasons. It’s home to many useful collections, including the TV News (archive.org/tv); Radio News Archive (archive.org/details/radio); eBooks and Texts Collection (archive.org/texts), which is also available with a number of extra discovery tools at OpenLibrary.org; and the Archive-It (archive-it.org) web archiving service. Those collections currently provide access to 588 billion archived webpages dating back to 1996, plus 28 million books and texts, 14 million audio recordings, 6 million videos, 3.6 million images, and 580,000 software programs. According to its About page, the mission of the Internet Archive is “to provide Universal Access to All Knowledge.”

Its Wayback Machine (web.archive.org) also began in 1996 but did not become available publicly until 2001. Founder Brewster Kahle’s idea was to preserve history by saving and storing copies of websites to show what they looked like at a particular point in time.

Some websites are automatically added to the Internet Archive, but not all. One of the many Wayback Machine features that many users don’t notice or employ as much as they should is Save Page Now (web.archive.org/save). Save Page Now allows you to capture and archive any webpage, PDF, or material in other formats (assuming it’s not being blocked by the site owner or behind a paywall). In other words, users can archive what they’re seeing on a webpage at any given moment in time and know that a copy of it will be available via the Wayback Machine.

Why You Would Want To Do This

We all know the web is a highly ephemeral beast. What you see on Monday could be gone by Tuesday. Moreover, even if the page or document is not completely removed, some amount of text might have changed or been deleted. The same is true for images and other parts of a page.

Saving material to the Wayback Machine provides some peace of mind, as it lets you know what you read/saw is now permanently accessible and stored with a timestamp of when it was archived. Additionally, the copy is stored not on your computer but on the Internet Archive servers. This is a boon to researchers who need to document what a website looked like when they first accessed it. Reporters using the Wayback Machine for fact-checking contribute to the overall trustworthiness of news by saving pages. Archiving government webpages contributes to transparency and accountability, particularly when governments want to remove data that is incompatible with their worldview not only from the internet but also from agency repositories.

Using Save Page Now helps to create a more robust web archive for all users, not only by adding new material, but also by creating new versions of previously archived material, since webpages are constantly updated but remain at the same URL. It preserves cultural heritage, explains current events for future generations, and promotes a storehouse of dependable information. Plus, it’s fun to look back at early versions of websites to see how primitive they now appear by modern standards.

Using Save Page Now

Archiving using Save Page Now is easy, fast, and free. Simply enter/post a URL into the Save Page Now box and wait a few moments for the process to complete. You’ll soon see a Wayback Machine of the URL captured/archived page. Embedded in the URL is a timestamp (support.archive-it.org/hc/en-us/articles/208333963-Interpret-Wayback-URLs-and-messages).

If you’re a registered Internet Archive user (highly recommended and free), Save Page Now becomes an even more powerful service.

Let’s assume you’ve registered and logged in. You’ll now see a number of additional options not available to unregistered users.

  • Save Outlinks: Checking this box will capture all of the outbound links embedded in the page you’ve submitted. So, with a single click, the URL you’ve submitted and all of the pages to which it links are captured at the same time.
  • Save Error Page: If the page is unavailable, a capture will be made.
  • Save Screen shot: A static screen shot is also made.
  • Save also in my web archive: A copy of the capture is placed in your own web archive that’s included with Internet Archive registration.
  • Please email me the results: A copy of the results are sent to the address that you used to register.

After clicking the Save Page button, a window appears that shows each URL as it’s being captured.

If a URL is being captured for the first time, that is also noted.

Saving a Copy/Paste

There are a number of tools that can save you a few seconds by not having to copy and paste a URL into the Save Page Now box. One of these tools is the Wayback Machine’s browser extension for Chrome and Firefox.

Along with a number of useful options, click Save Page Now to have the URL currently visible in your browser transferred to the Save Page Now interface.

Another useful feature the Wayback browser add-on provides is redirecting you to archived versions of a URL if the live version is unavailable, resulting in a 404 error code.

A Lot of URLs: Batch Processing

If you have a large number of URLs you want to archive, a batch processing option is helpful. It couldn’t be easier to use. To begin, you’ll need your Internet Archive login and access to Google Sheets.

  • Simply place the URLs into a Google Sheet, one URL per line.
  • Click Archive URLs (at this point, you’ll be asked for permission by Google to use the Internet Archive service).
  • Depending on server load, you should soon see the service begin to archive each URL.
  • Soon thereafter, the spreadsheet will be updated with the archived URL along with how many additional URLs (outlinks) were archived at the same time if the option was selected and if it’s a first archive.

Note: The Check the URLs Are Available in the Live Web option can be used to verify if a URL is available without archiving it.

Bottom Line

Given the growing amount of material on the web, its ephemeral nature, other influences, plus the fact that the Wayback Machine is often the only option available to access archived web content, Save Page Now is an important tool for the individual researcher, but also allows all of us to contribute to creating a permanent record of the web.

The Save Page Now screen as it appears to non-registered users.
The Save Page Now screen, with more options, as it appears to registered users.
The report after the Information Today, Inc. page was saved shows each URL captured.
The Wayback Machine’s Chrome extension


Gary Price is a librarian, writer, consultant, frequent conference speaker, and editor of Library Journal’s InfoDOCKET.

 

Comments? Contact the editors at editors@onlinesearcher.net

       Back to top