Darius J Chuck

Darius J Chuck – Format

Specification draft for a simple data interchange format1

Published close to 02.02.2020 02:02:20.20 CET.

This specification introduces an anonymous simple data interchange Format2, with potential advantages over JSON, YAML, XML, S-expressions, TOML, HOCON, and others.

In describing the Format, JSON will be used as a reference. Reading the description at JSON.org is advised.

The Format is very simple. It is essentially text in which paired Brackets ([ and ]) define structural relationships (i.e. nesting). On top of that there is a generalized Escape mechanism which allows encoding Brackets as part of text as well as mixing in more sophisticated grammars.

The simplicity and the very small number of building blocks or syntactical elements make the Format easier to read and write than any other format, both for humans and machines. The Format is completely programming language-independent and it is trivial to write a parser for it.

There exists only one structural element (the Pair) on top of which others (e.g. conventional dictionaries or lists) can be built without the need for separate syntax (as for objects and arrays in JSON).

Opposite to that there is only one non-structural element (the Fragment) on top of which others can be defined (strings, numbers, booleans, …).

You may want to look at some examples before proceeding.

The Grammar

The Grammar is (notation borrowed from JSON.org):

Document
    Fragments
    Pairs

Fragments
    Fragment
    Fragment Fragments

Fragment
    Escape
    Characters

Escape
    Begin Splice Escaped

Pairs
    Pair
    Pair Pairs

Pair
    Fragments Begin Document End

Begin
    '['

End
    ']'

Splice
    '\'

Block
    '>'

Characters
    Possibly empty sequence of characters excluding Brackets.

Whitespace
    Same as JSON's whitespace.

Escaped
    See The Escape mechanism.

The Escape mechanism3

The Grammar goes together with the very general and expressive Escape mechanism which, among other things, allows encoding Brackets as non-significant characters in Fragments.

An Escape sequence starts with Begin followed by Splice. The extent and meaning of the rest of the Escaped portion that follows varies depending on how it starts. Below is an incomplete semi-formal description of this:

if Escaped portion starts with: (Begin | End | Splice)Character
then sequence terminates at: Character
and Resolves to: Character
example: [\[ Resolves to [

else if Escaped portion starts with: Whitespace Characters EndTerminator
then sequence terminates at: Terminator
and Resolves to: portion between Splice and Terminator4
example: [\ text ] Resolves to text

else if Escaped portion starts with: Block FragmentsTerminator End LineA
then sequence terminates at: first occurrence of LineB TerminatorResolved
and Resolves to: portion between LineA and LineB
example: See Heredoc Example

else if Escaped portion starts with: FragmentsTerminator End
then sequence terminates at: first occurrence of TerminatorResolved
and Resolves to: portion between End and TerminatorResolved
example: [\EOT]textEOT Resolves to text

The general idea is that a Splice may be followed by a definition of a new grammar which supersedes the previous grammar. The definition should specify the grammar’s Terminator which, when encountered, restores the previous grammar.

Heredoc Example

The following Escape sequence:

[\>EOT]
text
EOT

Resolves to:

text

This functions similarly to the mechanism of here documents available in some programming languages and formats.

Resolution

A Document encoded in the Format before further processing should undergo the process of Resolving which will transform all Escapes to their Resolved forms (as defined above) and join them together with adjacent Fragments into Resolved Fragments.

Encoding structural elements

A List of values may be encoded as a sequence of Pairs where the Fragments are empty or consist entirely of Whitespace:

[parsley][sage][rosemary][thyme]

or equivalent:

[parsley]
[sage]
[rosemary]
[thyme]

This would be equivalent to the following JSON array:

["parsley", "sage", "rosemary", "thyme"]

A Dictionary may be encoded as a sequence of Pairs where Fragments are non-empty and non-Whitespace:

Name [Parsley, Sage, Rosemary and Thyme]
Artist [Simon & Garfunkel]
Release date [October 10, 1966]
Label [Columbia]

Equivalent JSON object:

{
    "Name": "Parsley, Sage, Rosemary and Thyme",
    "Artist": "Simon & Garfunkel",
    "Release date": "October 10, 1966",
    "Label": "Columbia"
}

Note that this encoding uses a convention where leading and trailing Whitespace is not part of the Dictionary’s keys. To have it included, it should be escaped:

[\ padded key ] [value]

Equivalent JSON:

{
    " padded key ": "value"
}

Structural elements may be nested freely:

songs [
    [
        title [Scarborough Fair / Canticle]
        length [3:10]
    ]
    [
        title [Patterns]
        length [2:45]
    ]
    [
        title [Cloudy]
        length [2:15]
    ]
]

Equivalent JSON:

{
    "songs": [
        {
            "title": "Scarborough Fair / Canticle",
            "length": "3:10"
        },
        {
            "title": "Patterns",
            "length": "2:45"
        },
        {
            "title": "Cloudy",
            "length": "2:15"
        }
    ]
}

Last updated 2020-02-02 at 03:00 CET.

Back to top.


  1. Note: this is not finished or properly formalized but captures the essential idea and should suffice for preliminary assessment and implementation.↩︎

  2. In this document a word that is Capitalized and italicized introduces a definition. References to this definition are capitalized but not italicized.↩︎

  3. Note: This part of the specification is incomplete and unstable.↩︎

  4. Portion between X and Y means that X and Y are themselves not included.↩︎