Archiving records with OpenDocument

Understanding the problem

When NASA scientists decided to go back to the Viking data, they found that the software used to read it no longer existed and the engineers who designed those computers were dead. Fortunately, they found paper printouts of the Viking data. But what if they hadn't? How can we ensure that the digital documents we make today will be read tomorrow?

OpenDocument is XML

XML files are fundamentally text files with structure. Take this example:

<text:h text:style-name="Heading">
  European Union
<text:p text:style-name="Standard">
  The European Union is a supranational union of 25
  member states from the European continent. It
  was established under that name in 1992 by the
  Treaty on European Union (the Maastricht Treaty).

Even if all knowledge of the format is lost, one can always read the plain text to obtain the information stored. XML is the most future-proof way of storing a digital document.

OpenDocument is human readable

Not all XML files are created equal. Some, like Microsoft's OXML, are designed purely to be used by a computer. Others, like OpenDocument, are designed to be as understandable to a human as possible. Compare:

Microsoft OXML:

    <w:b />
  <w:t>The final years of the twentieth century
  saw the birth of the Internet</w:t>


<text:span text:style-name="Strong_20_Emphasis">
   The final years of the twentieth century
   saw the birth of the Internet

This would allow a future historian to discern more information about the document, even if all knowledge of the document format had been lost.

OpenDocument is an open standard

Of course, the best situation is to never lose knowledge of how the format works. The best way to ensure this is to use an open standard. An open standard is one that is maintained by an independent standards group. It is in the public record and can be implemented by anyone.

If you store a document in a format owned by a single vendor, when that vendor is gone, knowledge of the format goes with it. If you use an open standard, a public record remains. OpenDocument is an open standard, maintained by both the ISO and the OASIS standard groups.

OpenDocument is platform independent

  • The files used by NASA to store the Viking data were tied to one type of computer. When NASA wanted those files, these computers no longer existed.
  • Microsoft Word documents are tied to one platform. To one company. Windows 95 came out 10 years ago. Can you depend on this company being here 200 years from now?

The OpenDocument format is not tied to any type of computer, or any application, or any type of software. So it will still be accessible to whatever systems our descendants are running 300 years from now.