litua release 2.0

Written on 2023-05-03 in 1177 words ✍️.
Part of project typho

Motivation

Text documents occur in many contexts. Actually, we like them as a simple means to document ideas and concepts. They help us communicate. But sometimes, we want to transform them to other text formats or process its content.

litua arised from a discussion on HackerNews where someone described a complicated way to read a text file with code snippets. That person then used a complex software stack to just generate a code-version for the compiler and a text-version for a human. It was following the idea of a literate programming system. I thought I can implement the same software stack simpler and simultaneously more generic. And I think I succeeded. I still named the project after this very specific usecase: literate programming with Lua.

I introduced litua in a Lightning talk at Grazer Linuxtage 2023. This is a written equivalent.

Update 2023-05-04: Karl suggested renaming lml because of a naming collision with LMLs. I followed the advice. Thank you!

Litua input syntax (cpil)

First of all, I defined my own markup language. The main idea was to create something more generic than XML. The other intention was to keep syntax as limited as possible; hence similar to LISP. Consider this example:

{element[attr1=value1][attribute2=val2] text content of element}
  • element is the name of the markup element. Its name indicates its semantics. In terms of litua, we can define the semantics ourselves.

  • attr1 and attr2 are attributes of this markup element. It gives more details about the markup element. For example, it could name the fontface used to represent these markup elements. In essence, we have an attribute attr1 here which is associated with the value value1. We also have attribute attr2 which is associated with val2.

  • content is the text affected by this markup.

Of course, one can nest elements:

{bold[font-face=Bullshit Sans] {italic Blockchain managed information density}}

In this sense, litua input syntax is very similar to XML (<element attr1="value1" attribute2="val2">text content of element</element>), LISP (e.g. (element :attr1 "value1" :attribute2 "val2" "text content of element")), and markup languages in general. Unlike other syntaxes, attribute values can have arbitrary markup again and thus the syntax is more generic than e.g. XML.

I also thought about escaping in this syntax and followed previous conclusions of mine: if you literally need a { or } in your document, you can escape these semantics by writing {left-curly-brace} or {right-curly-brace} respectively instead. litua input syntax files must always be encoded in UTF-8.

Finally, I named this syntax cpil for curly-brace prefix input language (using the file extension .lit), but it really just helped me getting started with writing parsers for markup languages. Arbitrary languages can be added an input format. Specifically, version 2.1 is going to ship XML as input syntax.

Processing a document

If you now store this content as doc.lit and litua, a corresponding .out file will be written. It has the same output content

bash$  cat doc.lit
{element[attr1=value1][attribute2=val2] text content of element}
bash$  litua doc.lit
bash$  cat doc.out
{element[attr1=value1][attribute2=val2] text content of element}

It becomes interesting if you learn that the document is read into a data structure. Specifically, a data structure in the Lua programming language.

local node = {
    ["call"] = "element",
    ["args"] = {
        ["attr1"] = { [1] = "value1" },
        ["attribute2"] = { [1] = "val2" }
    },
    ["content"] = {
        [1] = "text content of element"
    },
}

I call the name of markup units call (as in function call), because it follows the idea of calling a function processing its content. args is a map of keys to values, but the value is not one string, but an array. An array which contains a node. A node is either a string or a table with call, args, and content elements.

The interesting part is that we can take the data structure of the document and then rearrange the document and process it with hooks.

The concept of hooks

Besides doc.lit, we can also introduce a hooks file. The file needs to be in the same directory, start with the letters hooks and end with .lua. For example, hooks.lua:

Litua.convert_node_to_string("element", function (node)
    return "The " .. tostring(node.call) .. " said: " .. tostring(node.content[1])
end)

If we now run litua, the behavior changes:

bash$  cat doc.lit
{element[attr1=value1][attribute2=val2] text content of element}
bash$  litua doc.lit
bash$  cat doc.out
The element said: text content of element

What did we do? We asked litua to modify the behavior if we “convert a node to a string” when looking at some “element” call. The behavior is defined in a hook which is a Litua function. If you look closely, you can spot node.call in the function which accesses the call element inside the data structure above. Neat, right? So depending on the name of the call, we can implement a different behavior.

Going from there

There are several hooks (besides convert_node_to_string), you can use. I documented them in the README. I also recommend to take a look at the examples to see implementations for other usecases. Litua is MIT-licensed and ready for use, because I reiterated a lot to deliver a useful tool. And getting started is trivial by downloading the latest release and running it.

Conclusion

Litua resulted from my thought “I can do the same more user-friendly and more generic”. I think it is a neat tool which also served as my testbed for markup language parsing and interfacing Lua with rust. My current design for typho uses a similar design like litua to preprocess the input documents before they are passed to the typesetting engine.