Note: This is subject to future work. Errors, ambiguities, and omissions are to be expected.
Tree Annotation Syntax (TAS) is any syntax which is based on constructs equivalent to trees and annotations of TAO. A TAS may include other elements as well. The following syntaxes are all examples of TAS: XML, JSON, S-expressions, HTML, YAML.
TAO and its family are TAS.
All TAS can be translated between each other, so they are fundamentally equivalent. However in practice each is used only in a specific range of contexts because of certain properties that make it more suitable there.
Some have very broad ranges. For example JSON is today used as a universal human-readable syntax for data interchange with derivatives used in other contexts, such as application configuration. Before JSON, XML was a common language in those domains, and is still widely used today. Its close relative, HTML, is the standard markup language of the internet. Both HTML and XML were defined according to SGML which predates the internet, with origins reaching as far as the 1970s. Before that still, in 1960, S-expressions emerged – a milestone minimalist syntax with derivatives still in wide use.
These syntaxes and their derivatives are the major TAS families in use today.
Both the JSON and S-expression families can be further classified as Data TAS (DTAS).
Next to them, XML or HTML may be classified as (text) Markup TAS (MTAS).
DTAS and MTAS were designed for different purposes and have fundamental differences, for example in whether they try to accomodate more human or machine users1, in flexibility of their basic constructs, as well as in the total number of these available.
DTAS are more minimal and flexible and machine-friendly, although human-readable as well.
MTAS are more complex with more elements and restrictions. They are often used to mark up content by hand for rendering by web browsers or other programs.
DTAS are used sometimes to represent MTAS, but not the other way around. The DTAS representations of MTAS are however rarely written by humans.
It may be a reasonable conclusion then that DTAS are more generic than MTAS and perhaps a Universal TAS (UTAS) may exist which may be well-suited both for encoding data and marking up content.
If a UTAS indeed can exist there are two possibilities:
TAO is the result of an attempt to find such a UTAS on the assumption that it must be the latter. It follows the principles of DTAS which are the source of their generic nature: minimality and flexibility, taking them to a coherent extreme. In getting rid of a few more inessential elements, perhaps unexpectedly, it becomes a promising basis for MTAS as well.
TAO aims to remove a dichotomy between complicated syntax for loose or mixed structures and a simple syntax for rigid structures by simplifying further and lifting unnecessary fundamental restrictions.
If this is indeed founded and TAO is close to a UTAS, it should naturally, gradually, and gracefully replace XML, HTML, JSON, S-expressions and other syntaxes in most contexts.
Which is reflected, for example, in the way they treat freeform text and whitespace.↩︎