In Defense of COS, or Why I Love JSON and Hate XML
I don’t miss XML
I don’t miss XML. XML would only be a quarter bad if it didn’t introduce the ambiguity of trying to decide whether data should be an attribute of an element or the value of an element.
1
| |
It would only be half bad if it didn’t introduce the unwieldy syntax of triangle brackets and an end-of-element closing syntax that use more triangle brackets and a second copy of the element name. It would only be three quarters bad if the syntax for comments didn’t also use triangle brackets, along with a few hyphens and a required closure at the end of each line.
1
| |
It wouldn’t be bad at all if there hadn’t been so many overspecified XML standards, written by committee, adopted by geriatric software companies and financial institutions, resulting in complex building blocks such as SOAP and XML Security, with the desire to redefine everything as XML, and a hype factor that meant if you weren’t doing it in XML then some government department wouldn’t adopt it because it wasn’t open.
There are a few other XML technologies I don’t miss either. Remember schema definitions (DTD) and transformations? Thankfully I never actually had to work much with either of these. Remember Relax NG? It wasn’t a half bad way of describing something quite bad.
1 2 3 4 5 6 7 | |
All of this said, XML is less annoying as a document formation (e.g. HTML) than it is as a data exchange format (e.g. AJAX calls). But it is probably true that XML was a necessary stepping stone towards fully appreciating JSON.
I quite like JSON
Ah JSON. Very simple. Easy to type. Easy to parse. Easy to generate. Easy to use to perform simple operations such as RESTful CRUD. Directly supported by Javascript and many other languages. Not taken over yet by institutions seeking checkbox standards compliance, thus meaning less regulatory-like constrictions on innovation. Heaven.
1 2 3 4 5 6 | |
My only annoynances with javascript would be the strict need for double quotes around names and unforgiveness towards hanging commas, and the inability to add comments. But these are only annoyances, and not something to complain about.
There is little else to say about JSON. I love it and how it has unshackled innovation.
But before XML and JSON there was COS
My good friend Richard Cohn introduced the Carousel Object Structure when he developed PDF back in 1993 (Carousel was the project birth name for Adobe Acrobat)†. This predated XML and JSON and in fact COS may have received more wide spread adoption if Adobe had been more open about COS generation and parsing libraries, or if developers had been more willing to adopt non open-source technologies.
COS is used in FDF (Forms Data Format) and PDF (Portable Document Format) files. You can find both FDF and PDF specified in the PDF Reference. COS is not unlike JSON in its simplicity and brevity.
1 2 3 4 5 | |
COS supports booleans, integers and real numbers, strings, names, arrays and dictionaries. These are the same basic building blocks as you will find in JSON but with stronger typing of names and numbers. The syntax is very concise, just like JSON, easy enough to parse and generate, and easy to type by hand. In this way you can use COS to generate documents of the same complexity as you would with JSON.
Interestingly COS goes a bit further than XML and JSON and includes support for streams. And not just any streams, but streams that can be compressed and filtered in a number of different ways.
1 2 3 4 5 6 7 | |
There are actually stream filters defined for the various image formats. COS streams address a significant limitation of XML and JSON, which is the ability to efficiently embed larger data objects.
Alas we are not done yet. COS also includes support for indirect objects and an XREF table of all the indirect objects in a file. Indirect objects are objects that appear flat at the top level of the document. They allow you to reuse objects in a document, better break apart complex document hierarchies into more easily digested chunks and, hold your breath for this one, the XREF table allows random access to every indirect object in the file. This allows you to very efficiently access just those portions of a document that are needed, and for you to build very large documents that are still very efficient to parse. This is exactly how PDF files are built. PDF files are, in essence, like a portable NoSQL database.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
My wife will tell you that I’m a nostalgic guy and I’m quite nostalgic about COS. It is possible to imagine a world where Javascript or any other object were serialized to COS instead of JSON. And it’s possible to imagine all the interesting uses of streams, native compression support and random access of large documents other than just PDFs. It is not beyond the boundaries of reality to imagine COS having established itself as the building block of most of today’s data exchange and document formats.
Oh, and COS supports comments, and COS comments don’t require an end-of-comment closing element.
† Update: Regarding the origins of COS and PDF, Richard adds, “I’m afraid to add too many names as contributors because of the risk of leaving people out, but I will note that the basic PDF syntax that Jim describes was taken directly from PostScript, so give John Warnock and the original PostScript team credit there. There were a couple of attempts at a format for Carousel before what we have now, but Alan Wootton and I came up with the core of what became PDF 1.0 in a few days at my house in April 1992.”