Did John McCarthy lament SGML?

✍️ Written on 2024-11-28 in 1345 words.
Part of cs software-development

Question

Marty Alain, creator of the {λy} project, uses the following quote:

Let’s remember that the father of LISP, John McCarthy, lamented the W3C’s choice of SGML as the basis for HTML : « An environment where the markup, styling and scripting is all s-expression based would be nice. »

— ‘the {λy} project’ homepage

Is this quote genuine though?

Motivation

In a report, I am working on, I discuss markup languages. The {λy} project by Marty Alain is proposing one instance. Whereas W. Crichton and S. Krishnamurthi in “A Core Calculus for Documents: Or, Lambda: The Ultimate Document” correctly analyzed that there are different document languages (c.f. Table 1), {λy} squishes them up into one. The proposed language is an S-expression based language encorporating computational capabilities (with lambda calculus as obvious choice), CSS for stying, and HTML for structural information (as obvious choice assuming the WWW as only output format).

For the reader, I present one {λy} example:

{div {@ style="font:normal 3.0em optima;
                text-align:center;
                text-shadow:0 0 8px #000;
                transform:rotate(-5deg);"}
      hello world}

Ok, this is pretty sweet. With macro (not shown in the example) we get computational semantics. With div and @ (shown in the example) we can build trees. As a result, we can write HTML with a different S-expression-based syntax. But wait… what is the relation of S-expressions and HTML? What did John McCarthy say about SGML?

Analysis

Apparently Marty was made aware of the quote by spacebat on reddit in 2014-02:

Reminds me of John McCarthy’s lament at the W3C’s choice of SGML as the basis for HTML. An environment where the markup, styling and scripting is all s-expression based would be nice. […]

— reddit

Did he say that though? Searching for it lead me to Chas Emerick mentioning that John McCarthy gave a keynote at OOPSLA 2007.

[…] Anyway, he delivers a great zinger at the end, a tangential response to a question about the aims of Semantic Web technologies as they relate to his work on Elephant:

“When w3c decided to not use [s-expressions], but instead imitate SGML [for HTML], that showed a certain capacity to make mistakes — which, probably, they haven’t lost.”

Nice.

— McCarthy on the W3C

Okay, nice. Chas even provides a quote. Is it genuine though? Well, “Episode 21: Keynote — John McCarthy” actually has a recording and he provides that quote at timestamp 1:13:15:

“When W3C decided not to use LISP format but to imitate SGML for that showed a certain capacity to make mistakes” laughter “which they probably hadn’t lost”

— John McCarthy in 2007

On markup languages and S-expressions

On minimality

John McCarthy introduced S-expressions in his 1960 paper “Recursive Functions of Symbolic Expressions and Their Computation by Machine” as a reduced syntax to his M-expressions.

S-expressions were the inspiration to the family of LISP programming languages. Usually LISP is presented as syntactically minimal programming language where S-expressions form the syntactic basis. But neither of them are minimal in any way. Whereas the S-expression (x y z) represents this minimalism, syntax (x . (y . (z . NIL))) must be semantically equivalent. So there are two intersecting syntaxes where suddenly a dot as second item might mean “concatenate these binary children” or “a dot as second list item” (hint: the second interpretation is usually disallowed with some “bad dot syntax” error or alike).

To summarize, S-expressions are not minimal because of the dot special cases. And LISP is not minimal in two ways: First, S-expressions are implied as syntax and additional syntactic elements are added (e.g. quote is wrapped if some leading single quote is provided). Second, LISP implies syntax and semantics. They could be split up, but if I use LISP as terminology, I have a particular computational system including syntactical properties at hand.

In essence, one could simply declare a list to be either an atom like foobar or space-separated list items wrapped by ( and ). This would be a non-practical[1] but minimal syntax. Let us call this grammar PAREN. PAREN is syntactically minimal if you want to represent this hierarchical structure with barewords.

Mapping PAREN to XML

One could create a map between PAREN and SGML. In fact, SGML is unnecessarily complex as an input format. Thus let us talk about mapping PAREN to XML instead, because equivalent concepts apply. So a PAREN expression (bold text) could map to <bold>text</bold>. In that sense, XML is just a repetitive syntax (bold is written twice) compared to PAREN. But what about PAREN expression (sum 1 3)? <sum>1 3</sum> would be awkward because suddenly we introduce spaces as item delimiter within text nodes implicitly. So maybe <sum><item>1</item><item>2</item></sum> or <sum arg1="1" arg2="2"/> would be more appropriate. How about XML namespaces and XML processing instructions? Well, one could assign a special meaning to colons in the name and leading exclamation marks just like in XML.

The result is not really beautiful, but a mapping in both directions can be constructed.

S-expression based technologies

Today we know that S-expressions are less often used than XML notation. But S-expression has been used in multiple standards and technologies since the beginning:

Conclusion

Ok, so we originally started with a quote.

Let’s remember that the father of LISP, John McCarthy, lamented the W3C’s choice of SGML as the basis for HTML : « An environment where the markup, styling and scripting is all s-expression based would be nice. »

— ‘the {λy} project’ homepage

Did McCarthy lament? I would claim yes. His statement on OOPSLA 2007 indicates so.
Did he imagine this environment? The second sentence comes from reddit and I am totally not sure whether Marty accidentally attributes some quote to McCarthy which spacebat meant as a personal remark. A search for the origin of this second sentence lead to no result.

When it comes to the topic of markup languages, I cannot see how people think that squishing syntactic and semantic concerns together makes sense. For example (I think this is related), I cannot figure out the name of Marty’s language: alphawiki, λtalk, or λy? The boundaries are not clear. But syntax and semantics would be a simple boundary providing separation of concerns.

And regarding S-expressions (or PAREN) and SGML (or XML), it needs to be pointed out that XML has technologically in fact won[2]. But we should never forget what the limits of our technologies are (syntactically it is PAREN).


1. How does one represent atoms with a spaces in its name?
2. There is no XQuery/XSD/… for S-expressions. Many fundamental information technologies are built on top of XML; not S-expressions.