Ever get a song stuck in your head? While working on this issue of Online Searcher, my earworm was Paul Simon’s “Slip Slidin’ Away.” It seemed appropriate as I contemplated the impermanence of information in today’s world. Linkrot, content drift, web archiving issues, and reference rot threaten the very foundations of our research activities. I doubt, however, that Simon had online searchers in mind, not even when he wrote these lines:
The information’s unavailable to the mortal man
We’re working our jobs, collect our pay
Believe we’re gliding down the highway
When in fact we’re slip slidin’ away.
Information disappears from the web not only because of “natural causes” but also due to overt removal. It’s been widely documented that the U.S. government is deleting information from its websites. Most recently, it was the 1600 Daily archive from whitehouse.gov. The European Union’s Right to Be Forgotten law also mandates the erasure of personal information from search results. Search engines, including Google, Bing, and Yahoo, are pushing back, saying this amounts to censorship.
When entire services shut up shop, information housed there can be at risk. Take the closure of Wikispaces. One organization with which I am affiliated maintained its entire historical archive in wikis housed by Wikispaces. The saving grace was that Wikispaces gave lots of notice regarding its imminent demise, allowing people to salvage their data and save it somewhere else.
Could information stored in folders on your own computer disappear? Say you saved search results in a folder on your laptop. It’s a Windows 10 machine. You run the October 2018 Windows update and, presto chango, you have an empty folder. No more search results. Yeah, that happened, and Microsoft has withdrawn the update.
Even information not in electronic form is slip sliding away. The news is full of hurricanes and fires. Structures burn, are destroyed by wind and water, and their contents cease to exist. Fragile records about the early history of North Carolina, many stored in local courthouses that were inundated with flood waters following Hurricane Florence, may decay, rendering facts about conditions in the 16th, 17th, and 18th centuries unavailable. In Brazil, a fire roared through the National Museum, destroying irreplaceable artifacts. It housed more than 20 million items and was Latin America’s largest anthropology and natural history collection. None of it was digitized.
There’s good news on the linkrot front, however. Wikipedia announced that it has “rescued” more than 9 million broken links by routing them through archived versions in the Internet Archive’s Wayback machine. Almost every URL referenced in close to 300 Wikipedia sites has been archived as soon as those links are added or changed. How big a task is that? Try some 20 million URLs each week from 22 Wikipedia language editions. Not something to be done manually, obviously, so the Internet Archive is using a software robot called IABot.
The bad news is that it’s only links in Wikipedia that have been rescued. Large as it is, Wikipedia is only a small part of web-based information suffering from linkrot. Paul Simon may have the final word on disappearing information: “The nearer your destination, the more you’re slip slidin’ away.”