In Defense of COS, or Why I Love JSON and Hate XML

I don’t miss XML

I don’t miss XML. XML would only have been a quarter as bad as it is if it didn’t introduce the ambiguity of trying to decide whether data should be an attribute of an element or the value of an element.

xml
1
<element attribute="attribute-value">value</element>

It would only be half bad if it didn’t introduce the unwieldy syntax of triangle brackets and an end-of-comment closing syntax that use more triangle brackets and a second copy of the element name. It would only be three quarters bad if the syntax for comments didn’t also use triangle brackets, along with a few hyphens and a required closure at the end of each line.

xml
1
<!-- Please don't make me write XML comments -->

It wouldn’t be bad at all if there hadn’t been so many overspecified XML standards, written by committee and adopted by geriatric software companies and financial institutions. Building blocks such as SOAP and XML Security come immediately to mind. Try on OFX or IFX for real pain.

There are a few other XML technologies I don’t miss either. Remember schema definitions (DTD) and transformations? Thankfully I never actually had to do transformations. Or Relax NG, which wasn’t a half bad way of describing something quite bad.

xml
1
2
3
4
5
6
7
<data version="0.1">
  <or-version>0.1</or-version>
  <items>
    <item>first</item>
    <item>second</item>
  </items>
</data>

All of this said, XML is less annoying as a document formation (e.g. HTML) then it is as a data exchange format (e.g. AJAX calls). None the less, XML was probably a necessary stepping stone towards fully appreciating JSON.

I quite like JSON

Ah JSON. Very simple. Easy to type. Easy to parse. Easy to generate. Easy to use to perform simple operations such as RESTful CRUD. Directly supported by Javascript and many other languages. Heaven.

javascript
1
2
3
4
5
6
{
    "version": "32",
    "myarray": [ "first", "2", "3.01" ]
    "myobject": { "name": "value", "name2": "value2" }
    "mystring": "Always in double quotes"
}

My only annoynances would be the strict need for double quotes around names and unforgiveness towards hanging commas, and the inability to add comments. But these are only annoyances, and not something to complain about.

There is little else to say about JSON.

But before XML and JSON there was COS

My good friend Richard Cohen introduced the Carousel Object Structure when he introduced the PDF file format back in 1993 (Carousel was the project birth name for Adobe Acrobat). This predated XML and JSON and in fact COS may have received more wide spread adoption if Adobe had been more open about COS generation and parsing libraries, or if developers had been more willing to adopt non open-source technologies.

COS is used in FDF (Forms Data Format) and PDF (Portable Document Format) files. You can find both FDF and PDF specified in the PDF Reference. COS is not unlike JSON in its simplicity and brevity.

COS
1
2
3
4
5
<<
   /Version 32
   /MyArray [ (first) 2 3.01 ]
   /MyObject << /Name (value) /Name2 (value2) >>
>>

COS supports booleans, integers and real numbers, strings, names, arrays and dictionaries. These are the same basic building blocks as you will find in JSON but with stronger typing of names and numbers. The syntax is very concise, just like JSON, easy enough to parse and generate, and easy to type by hand. In this way you can use COS to generate documents of the same complexity as you would with JSON.

Interestingly COS goes a bit further then XML and JSON and includes support for streams. And not just any streams, but streams that can be compressed and filtered in a number of different ways.

COS
1
2
3
4
5
6
7
<<
   /Length 16
   /Filter /FlateDecode
>>
stream
**compressed stream**
endstream

There actually are stream filters defined for the various image formats. COS streams address a significant limitation of XML and JSON, which is the ability to efficiently embed larger data objects.

Alas we are not done yet. COS also includes support for indirect objects and an XREF table of all the indirect objects in a file. Indirect objects allow you to reuse objects in a document, better break apart complex document hierarchies into more easily digested chunks and, hold your breath for this one, the XREF table allows random access to every indirect object in the file. This allows you to very efficiently access just those portions of a document that are needed, and for you to build very large documents that are still very efficient to parse. This is exactly how PDF files are built.

PDF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
%PDF-1.3
1 0 obj
<<
/Producer(Adobe Indesign)
/CreationDate(D:20110706200610+0700)
>>
endobj
2 0 obj
<</SecondObject ... >>
xref 0 10
0000000015 00000 n 
0000000159 00000 n
trailer
<</Size 162
/Root 161 0 R
/Info 1 0 R
/ID[<f5d14afe5e3b99ec7aa57fe0a3d88d66><f5d14afe5e3b99ec7aa57fe0a3d88d66>]>>
startxref
282931
%%EOF

My wife will tell you that I’m a nostalgic guy and I’m very nostalgic about COS. It is possible to imagine a world where Javascript objects serialized to COS instead of JSON. And it’s possible to imagine all the interesting uses of streams, native compression support and random access of large documents other then just PDFs. It’s entirely possible that COS could have become the building block of most data exchange and document formats.

Oh, and COS supports comments, and COS comments don’t require an end-of-comment closing element.

Comments