Archive: Computer Collector Newsletter / Technology Rewind, Jan. 2004 - March 2006

Click here to return to archive

A media preservation standard

by Sellam Ismail

Being an archivist by nature with a long-term outlook on the preservation of not only computers but computer software, I've contemplated for quite some time a universal format for archiving all types of data media. The problem is that media such as floppy disks and magnetic tape does not last forever. While computer hardware is relatively easy to preserve as you just need to find a place you can store it for the long-term, software presents more of a challenge. Software is easier to store physically but it's stored on media that is actually volatile, and bits will eventually drain off in one way or another: magnetic fluctuations will fade from disks and tapes, punch cards and paper tape will disintegrate, etc. Therefore, a long-term solution must be devised for preserving software in all its forms.

The solution comes in two parts. First, we must devise a universal format for imaging all manner of media. This can be floppy disks, magnetic tape, punched cards, paper tape, analog audio cassettes, etc. Ideally, the format will preserve the original media in every detail and in a manner that would allow the physical media to be re-created at a future date. Second, and most critically, a methodology must be developed to ensure that such images are preserved from generation to generation, where a generation is the life of the media that the images are stored to, as well as the lifespan of the caretaker tasked to keep a set of archived images in perpetuity. To put it simply, we must rely on future generations of humans to ensure that the archives we create today to preserve software are regularly updated and moved to fresh media. If there is one lapse in the sequence, the work we do today to preserve the past may well be lost in the future.

Since the primary focus today is to preserve what software we have before it is lost, we must first focus on creating a universally recognized standard for imaging media. An urgency for this exists: the most widely deployed data storage technology of the past three decades is the floppy disk, the theoretical longevity of which is 15-20 years. In reality, disks created in the 1970s are still readable today, but may well be fast decaying. We are at a point where vast volumes of data will begin to slowly leak away.

I began a discussion on the Classic Computers mailing list in 2001 to formulate an imaging standard. My first instinct was to go with a binary format. Hans Franke, who had also given consideration to the issue of software preservation, was first to suggest that the format should be in plaintext so that it would be easier to decode for future generations, and to that end proposed an image format that used markup tags in a basic textfile. After much arguing, I was convinced that Hans had the right idea. One aspect of this standard is that it should be easily readable and/or decodable by future generations. In the worst-case scenario, if only the images themselves survive into the future and documentation of the structure of the images is lost, humans of the future should be able to use simple tools (i.e. basic text editors) to decode the image files. A system of markup tags, while still cryptic in their own right, would be far more preferable to a straight binary format, since tags to allow human readable comments could be included to provide clues that could be used to help decode the image if no other supporting documentation was at hand.

In November of 2001, I presented a paper[1] co-developed by myself and archaeologist Christine Finn titled "The Valley of Lost Data: Excavating Hard Drives and Floppy Disks" presented at the yearly Archaeologie und Computer (Archaeology and Computer) conference in Vienna, Austria. This paper primarily discussed the idea of treating computer media as virtual archaeological dig sites and mining the data therein to learn about past cultures. The issue of data longevity was also addressed in the paper and a crude example of the data preservation format that we had been discussing on the Classic Computers mailing list was described.

Discussions of the image format standard lapsed until just recently, when I revived the topic once again on the Classic Computers mailing list and indicated that this time I was serious about developing a media imaging format that would be universally recognized as a formally codified standard in the computer industry as well as in institutional museum environments. A lively and sometimes cantankerous debate ensued. It is obvious that many people are passionate about preserving software and want to make sure that whatever format is devised is indeed adequate for the long-term task for which it is being designed.

A good amount of groundwork was laid out. Many good suggestions were made with contributions from at least a dozen individuals. I collected the most obvious and the most interesting suggestions with the intent to collate them into a single initial working document that will be used as the basis for further discussing and developing the standard. A mailing list was created to carry these discussions [2].

Once the standard has been discussed and debated and an initial draft of the standard is written, the next step will be to share the draft with relevant organizations and garner support. The goal is to have a standard recognized by an international standards organization such as the ISO.

As I mentioned earlier, creating the standard is the easy part. The bigger task will be to convince various people and organizations that are preserving software to implement the standard. But even more challenging will be actually preserving archives of computer software well into the future. Even storing imaged media to CD-ROM or DVD-ROM means that the life expectancy of those archives is, at best (as far as we currently know) something like a century. If the intent of this archival standard is to effect eternal historical software archives, it is imperative that each institution or organization that tasks itself with preserving computer software put into place policies and procedures for constantly updating and refreshing the media that historical software archives are stored to, at least until which time a permanent storage media can be devised that lasts virtually forever. This means that every generation (of humans) must periodically and persistently copy the archives onto fresh media, or move the archives onto new media formats that spring forth in the future. In the end, it is our collective vigilance that matters most.

I am very committed to developing a standard as I've described and am excited at the prospect of being part of something that will allow future generations to have access to software from the earliest periods of computing history.