Will HistoryofInformation Survive Functionally in the Longer Term as a Set of Printed Books?

On July 16, 2020 I celebrated my 75th birthday, and though I remain in good health, from time to time I wonder about the fate of HistoryofInformation after I inevitably pass. Because HistoryofInformation traces the history of information and media back to the earliest records thousands of years ago when I think of the longer-term future I think in terms of centuries rather than decades, and I am cognisant that the issue of long term preservation of digital information remains unresolved chiefly because we have not had experience in preservation of data beyond the seven decades since 1945 when electronic computing began. Data preservation has been of interest to me since the beginning of HistoryofInformation more than twenty years ago. You will find relevant entries on various elements of the long term data preservation problem within HistoryofInformation, indexed under the Conservation, Preservation & Restoration theme.

As of 2020, no one knows how long the "most durable" storage media currently used for electronic data may be operational, or whether a more permanent medium for storage will ever be invented. Claims are made  periodically for various long term data preservation schemes. One of the more credible schemes is the Arctic World Archive stored 300 meters under the permafrost in a decomissioned coal mine on the island of Svalbard, Norway. That program stores digital information on proprietary Piql film made of polyester coated in silver halide crystals and powder-coated with iron oxide. Supposedly the Piql film has a life span of at least 500 years, and possibly up to 2000 years, if stored in optimum freezing conditions. This attempt at a fail-safe scheme stores the data in a passive way, but does not maintain its functionality. Instead it provides methods that the proprietors hope will allow the data to be brought back to functionality at some date projected into the future. It is conceived as a kind of time capsule for data. Coincidentally the designers of the scheme selected an advanced method of microfilming for data storage.

My longer term goals for HistoryofInformation differ from the intention of the Arctic World Archive. What I would like to facilitate is the functionality of the data and programming online for some extended period after I pass. The value of the database is in its accessibility and utility as most of the information that I have written about in this project is available elsewhere in printed books or in digital form. Beyond the uncertainty of the durability of storage media, in any attempt at long term preservation we recognize that programming conventions are in a constant state of evolution, and the most logical assumption must be that even if the data comprising HistoryofInformation exists, at some point in the future it may no longer be accessible or operational. Digital archives at governments and institutions currently work under the assumption that digital information must be curated for long term preservation. By this I mean that for "long term preservation" the data will have to be migrated to new storage media as they are developed and it will also need to be converted to new programming conventions as they evolve. Anyone who has been doing word processing for the past thirty or forty years already possesses word processing files that are no longer readable. If this problem occurs within a few decades what will happen in a century?

A corallary of this requirement of curation is that preservation of digital information is far more costly and complex than simply placing a book on a library shelf. This we have learned through our few decades of experience. How much data will be preserved, and how much data might be lost in these active conversion processes intended to maintain data and its functionality, and how much data will be lost simply through "neglect," are long term unknowns. Undoubtedly some of these issues may be resolved in the future, and theoretically there could be a time where data left in storage will be accessible to readers with the knowledge or skill to read them over millenia, just like cuneiform tablets and Egyptian hieroglyphics described in HistoryofInformation are read by experts today. 

I am also aware that many other people share my concerns about the long term preservation of data and its functionality, and that there are organizations such as the Internet Archive that intend to provide long term curated storage facilities for websites like mine. More than likely I will contract with one or more organizations to provide services to maintain my websites after I pass. But no one can predict the long term future of any organization with certainty, and there is always the possibility that different people in different circumstances will interpret any agreement signed during my lifetime in a different way at some point in the future.

In 2014 I added an entry to HistoryofInformation headed Imagine Publishing the Wikipedia in 1000 Physical Volumes??. It described a crowd-funding project to actually print out the Wikipedia in a set of maybe 1000 printed volumes that might have comprised over one million pages at the time. As far as I know, the project was never funded, and never occurred. Because of the dynamic nature of the Wikipedia, a print-out would have only captured it at a moment in time. Would that have any value? When I wrote that database entry in 2014 I doubted the value of such a project. When I returned to this issue in November 2020 what was available from the would-be publishers of that massive 1000-volume project was a series of single-volume current paperback books on specific subjects based on Wikipedia data published by PediaPress.com in association with the Wikipedia.

Thinking about the 2014 scheme to print out the Wikipedia, by November 2020 when I sat down to write this brief essay, I realized that my perspective had changed. As I think about long term preservation of HistoryofInformation, if that will actually occur after I pass, I realize that as of 2020, as possibly for the forseeable future, the only way to assure preservation of my work over a significant period of time may be to print out each of the roughly one hundred themes in HistoryofInformation as a series of volumes on archival paper, and to place them in whatever institutional libraries might be willing to preserve them. As much as I would like to ensure that my data will be curated into the future, how could I ascertain that? How will I have confidence that any data curation agreement that I sign will be implemented long after the people who signed the agreement have died? What we do know from long experience, as documented in HistoryofInformation, is that some copies of books that are distributed widely, and preserved in libraries, tend to survive with a reasonable degree of probability, and that archival paper has a shelf-life of at least 300 years. Many books printed in the 15th century are in beautiful legible condition today.

It seems ironic that, in spite of the immense technological progress since electronic computing began in 1945, when we think of long term preservation of data and its functionality in 2020 the only method in which we truly have long term confidence remains traditional printing in book form on archival paper. With this in mind I am wondering whether HistoryofInformation will someday occupy fifty or one hundred bound volumes, and if the bound volumes exist on library shelves whether they will ever actually be consulted and used.

Jeremy M. Norman
November 20, 2020