From Today’s Lecturer: Preserving content means preserving software, hardware

Guest Column By: Vint Cerf

Today, most of us carry mobiles with digital cameras. We use applications that produce endless varieties of digital files, some extremely complex. We make use of software to interact with, display or otherwise render digital objects. We even imagine that by digitizing hard copies of papers, books, photographs and films that we are preserving them for the ages. We might be wrong about that.

Digital media do not generally have infinitely long lifetimes. In fact, some may have lifetimes on the order of years or perhaps a small number of decades. Worse, these media require devices to read them. How many of us still have working 5¼-inch floppy readers? 3½-inch floppy readers? DVD and CD-ROM readers? Seven-track or nine-track magnetic tape readers? Even if the media retain the data; there may be no working devices to read the bits thereon. It gets worse.

Many digital files or objects have complex formats requiring software to correctly render (video, audio, images, text documents, presentations, games). That means that useful preservation of digital objects requires not only a readable medium, a reader, but also the application software needed to render the content. The applications generally run on a computer that also requires an operating system (e.g. Microsoft Windows, Apple OS X, Linux). And, of course, the operating system has to run on specific computer hardware. It is entirely plausible that you might have digital files available but current-day computers may not have the ability to run the appropriate operating system and application software to render them.

What all of this suggests is that a program of preservation of software as well as digital files is needed to assure that the digital content of today is accessible into the distant future. There are many implications of this conclusion. The first is that software (applications and operating systems) will need to be archived for future use. The terms and conditions for that use will need to be established. An argument could be made for preservation privileges not unlike fair use in copyright so that retaining copies of software for preservation does not violate copyright or patent laws. Similarly, the right to run the software needs to be made open to the public at some point to assure that future users will be able to access and render archived digital content requiring the software. It may even be arguable that the present terms and conditions for copyright and patent are unacceptably extended in the event that current-day hardware is incapable of executing the requisite software.

Cloud computing may have a role to play in this conundrum. It is now common to run what are called virtual machines in cloud computing data centers. What this means is that older operating systems and application software can potentially be executed in the virtual environment, providing users with the ability to render and interact with older digital content. This is not a trivial thing to implement, however.

At Carnegie Mellon University, professor Mahadev Satyanarayanan has developed a system he calls OLIVE that has the capacity to emulate hardware and effectively run older operating systems and applications. In fact, in some cases, the emulated system runs faster in the cloud than it did on native hardware of the past.

One cannot ignore the fact that archiving of content, software and hardware emulation programs has a cost and to assure long-term access to digital content, business models will have to be developed that support this process. Some government agencies such as the National Archives are charged with retaining information relevant to administrations of the past in perpetuity. To do this, today, they will have to solve this problem at least for the content relevant to their mission. One can imagine private-sector operations doing similar things for corporations and the public.

In the long run, we owe it to our descendants in the distant future to provide them with the ability to see the digital world as we saw it in the 20th and 21st centuries and thus, we are motivated to find solutions today.

Vint Cerf is vice president and chief Internet evangelist at Google. Widely known as a “Father of the Internet,” he is the co-designer with Robert Kahn of TCP/IP protocols and basic architecture of the Internet.