Note: This is subject to future work. Errors, ambiguities, and omissions are to be expected.
The major good qualities of HTML and XML are:
ability to naturally mix structure into freeform text with low number of meta symbols that may interrupt it and thus need to be escaped
whitespace flexibility
TAO retains these.
On the other hand HTML and XML are sometimes described as unnecessarily:
And combinations of those.
These are all undesirable traits in a syntax that may translate into thinking when using it. TAO tries to eliminate them, following principles of minimalism.
XML’s syntax has a major and unnecessary impediment for extension: the ubiquitous attributes. They reintroduce some compactness to the syntax at the expense of making extensibility difficult. This is not a necessary transaction. For example a compact piece of XML like this1:
<person firstName="John" lastName="Smith" age="25" address="New York" />
may be expressed in TAO as:
person [first name [John] last name [Smith] age [25] address [New York]]
If we were now to extend the address
value to include more structured information we may do it like so in TAO:
person [first name [John] last name [Smith] age [25] address [street address [21 2nd Street] city [New York] state [NY] postal code [10021]] ]
This is simply adding another level of nesting inside of the value.
In XML though we would have to change the structure significantly to achieve the same result:
<person firstName="John" lastName="Smith" age="25">
<address streetAddress="21 2nd Street" city="New York" state="NY" postalCode="10021" />
</person>
For this reason much use of XML as DTAS avoids using attributes altogether (reintroducing redundancy) or devises different complicated ways or workarounds of dealing with the shortcomings of the syntax. This is one of the reasons why XML, as DTAS, was largely superseded by the much simpler and less (but still) restrictive JSON.
This hints at how unnecessary complexity and restrictiveness of a syntax may propagate into thinking and inform design.
The above example shows XML used as DTAS for which it was not originally designed. After all it is a markup language. As is the closely related HTML. In markup HTML, even though imperfect, still prevails as a good enough solution. This is due to its aforementioned good qualities.
A HTML fragment like this for example:
<p>TAO (<a href="tao.html">Tree Annotation Operator</a>) is a unique and extremely simple
<a href="tas.html">Tree Annotation Syntax</a> which can be used in
<a href="contexts.html">all kinds of contexts</a>. It is easily readable and writable
</p> by humans and machines and has only three basic constructs
looks like a mixture of freeform text with parts marked up as hyperlinks. It is still a tree, since this is the ultimate syntactical structure, but less apparently so than a typical piece of data like the above XML example.
In this context the way HTML appears fits more naturally and conveniently into its domain. It still has problems, such as redundancy, which is one of the reasons it is being replaced in some contexts by more lightweight markup syntaxes, like Markdown. However these are much more specialized and so are not suitable as a underlying syntax of the World Wide Web. And so HTML remains.
A lightweight and generic syntax would therefore be ideal. This is what TAO might offer:
[p`>TAO ([a href[tao.html]`>Tree Annotation Operator]) is a unique and extremely simple [a href[tas.html]`>Tree Annotation Syntax] which can be used in [a href[contexts.html]`>all kinds of contexts]. It is easily readable and writable by humans and machines and has only three basic constructs]
This is the same fragment as above, closely translated to TAO, without changing the structure of HTML or trying for extra compactness. But it already removes some redundancy and opens up avenues for lifting of restrictions.
For example in the general case we might allow extensible TAO-HTML attributes which would be translated into a HTML attributes if they consist of a primitive or to an HTML tree if they contain a structure, similarly to the above XML example.
In this way we might slowly evolve out of HTML, starting at near 1:1 translation and going from there into a better MTAS.
Based on an example from Wikipedia↩︎