TAO: the JSON family

Note: This is subject to future work. Errors, ambiguities, and omissions are to be expected.

The major good quality of JSON is simplicity.

TAO not only retains it but dramatically increases it while introducing good qualities of its own.

On the other hand JSON is sometimes described as unnecessarily:

And combinations of those.

These are all undesirable traits in a syntax that may translate into thinking when using it. TAO tries to eliminate them, following principles of minimalism.

Restrictiveness

JSON is minimal and simple at its core. This however does not imply that it needs to be restrictive.

The unnecessary restrictiveness of JSON spawns countless incompatible derivatives.

TAO instead is permissive and accomodating, providing space within a maximally simple, generic, and flexible framework.

On the other hand TAO may be easily restricted by disallowing certain constructs while retaining the essential while JSON cannot be made more flexible syntactically without breaking compatibility.

Primitives

JSON defines the following primitive data types in its syntax:

TAO instead has only two primitive syntactical data types:

With these, all other primitives can then be built. This is a better approach, because it avoids problems built into JSON which cannot be separated or removed.

Numbers and other primitives

The definition of a JSON number is simple indeed, but it is not sufficient to truly represent numbers syntactically and causes problems of compatibility and precision.

The only kind of numbers that JSON can syntactically represent are positive and negative decimal numbers with optional exponents. JSON as a syntax does not define non-syntactical details such as precision of those numbers. In practice however, since it is derived from JavaScript, some implementations assume it to be in accordance with JavaScript’s limited-precision floating point number data type. This may lead to significant problems and errors in data interchange.

Moreover, values that are part of JavaScript’s definition of a number, such as Infinity or the NaN value can’t be represented with JSON’s number data type.

This is a real source of practical issues caused by restrictive syntax.

TAO approaches the problem differently by providing only a the basic syntactical primitives on top of which grammatical or other restrictions may be imposed to achieve different number definitions without influencing the underlying syntax of TAO. These restrictions are a separate concern and may be domain-specific, even agreed between two individual interchange parties.

Practical use of JSON follows a similar pattern, encoding different kinds of numbers or other primitives with the fallback string type. The built-in number then may be considered an impediment rather than a feature.

This falling back on a generic (usually string) type is a very common pattern in many DTAS and is the only universal solution, since there are many different valid representations of numbers or other primitive data types.

As UTAS, TAO retains only universal features and so only two primitive syntactical shapes are left which can be restricted arbitrarily. Common restrictions can then be standardized and become universally understood.

Objects

JSON defines two distinct recursive data types: objects and arrays.

TAO instead has only the tree which is mutually recursive with the tao.

A generic recursive data type is a defining feature of TAS. However one is enough.

In JSON the more generic one is the array, so objects could be removed, without losing expressive properties of the syntax.

Objects in JSON capture the concept of an unordered key-value collection as specialized syntax. This is however unnecessarily couples a strictly semantic consideration into the syntax.

The concept of unorderedness may be captured on a separate syntactical or semantic layer. Thus, TAO itself takes no position on the matter, instead offering flexibility.

Reduced JSON as precursor to TAO

A reduced version of JSON could be imagined which not only dropped objects, but all other syntactical constructs except arrays and strings.

This minimal syntax would still be capable, in principle, of encoding arbitrary annotated tree structures. If we modify that further by dropping a few more decorating symbols and loosening the grammar, we arrive at a member of the TAO family.

In this way JSON may be viewed as a restricted version of TAO.

Whitespace

The restrictiveness of JSON’s definition of strings and its treatment of whitespace are features that make it unsuitable as a markup syntax or a truly human-readable data syntax.

In TAO, especially with the use of the concept of a raw tao1 it is possible to embed any content with minimum possible syntactical friction.

Comments

JSON, unlike many other syntaxes, does not contain any notion of a comment. This is not in line with the goal of being easy for humans to read and write.

It is another source of countless incompatible JSON derivatives.

A commonly given reason for this restrictiveness is that comments may be used beyond some predefined use cases.

In the same way, though, other features may be misused. Lack of comments may be substituted for by the use of special structures that contain strings in their place.

In fact, strings may fit into almost any use case. This does not imply that they need to be restricted.

TAO takes no position in the matter and is independent of any use cases as such. It does however provide a generic feature that may be used to implement comments.

Example

The following JSON sample2:

{
  "first name": "John",
  "last name": "Smith",
  "age": 25,
  "address": {
    "street address": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postal code": "10021"
  },
  "phone numbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "fax",
      "number": "646 555-4567"
    }
  ]
}

may be translated into TAO as:

first name [John]
last name [Smith]
age [25]
address [
    street address [21 2nd Street]
    city [New York]
    state [NY]
    postal code [10021]
]
phone numbers [
    [
        type [home]
        number [212 555-1234]
    ]
    [
        type [fax]
        number [646 555-4567]
    ]
]

  1. Which is only an interpretation of data encoded in TAO and not a special grammatical construct↩︎

  2. Based on a JSON sample from Wikipedia↩︎