Motivation
I came across finl recently. Somehow I missed it previously since Don’s blog articles started in 2020. What is it about? finl is another project trying to replace LaTeχ. Or it is supposed “reinventing LaTeX for the twenty-first century”. So with my typho project, this project is one competitor. It is even one of the few competitors to be written in rust. Don also showed his process of discovering rust. After reading all resources about finl, I wrote this blog post which discusses existing design decisions. The top-most comment is harsh:
Sounds like attempt n+1.
Resources
The author is Don Hosek:
A review of his blog articles
I now look at blog articles going from the oldest to the latest (up to “Announcing finl_unicode 1.0.0”). I skip some articles not relevant for discussion here. I strike out points which to deprecated due to an updated blogpost.
“Why finl? A manifesto”
-
I agree with most of the points. There are plenty of texts describing limitations of Teχ. The article mentions some of them.
-
In general, the fragmentation of Teχ engines shows that the community is large and active.
-
The assumption that a document “should allow to just switch from one documentclass to another” is indeed also my assumption, but I have never seen this guarantee written down anywhere. I am not sure it is supposed to hold true universally for LaTeχ.
“Choosing a programming language”
-
It is neat to discuss the programming language, but this is completely contrary to the previous ideas that the usability of Teχ is bad. From a programmer’s perspective the choice of language is very central and it might even have some important implications for the user, but I would like to see a certain language-agnostic look at digital typesetting. Yes, interfacing Swift with Java will be more difficult than C++ and Java. But one needs to be aware that the community of people typesetting documents is huge. If there is a possibility, the implementation will be made. There are interfaces between Python, Perl, and LaTeχ. As a result, I would like to look at this from a language-agnostic point of view and the only question is which interfaces are defined and what are they used for. Shall be user be enabled to provide his own interface for components? Or is it all just about configuration? Which data is passable through this interface? What is possible with this interface. The language is a secondary decision IMHO.
“Defining a document markup language for finl”
-
The grammar defined here has no statement regarding its complexity. It seems to be purely defined by an arbitrary subset of LaTeχ commands and is provided without a formal definition. For example the verbatim mechanism of
\verb+some_verbatim+
is shown which implies certain assumptions about the memory of the lexer (specifically it needs to remember the lexeme+
to match it later again). Sorry, I am aware that I am nit-picking here, but the world of poorly defined grammars is too large not to be bothered. -
The choice of types is poor. Specifically the types {parsed text, mathematice, unparsed text, key-value pair list, no-space mode} are defined. Since this article tries to establish lexical definitions, the choice of a “mathematics type” is understandable (even though I consider the details of this type much more important than this article). However, specifying a “key-value pair list” seems to mix the notions of “lexical types” and “semantic types”. Whereas mathematics seems to introduce a mode with customized lexical conventions, the key-value pair list is usually a data structure and thus a semantic type. In short, what values can be put into a key-value pair list? I assume the other types. But the statement “any white space at the beginning or end of the list will be ignored as well as any white space surrounding the arrows” contradicts the idea of parsed text which seems to be able to include any Unicode text.
-
The syntactic choice of
→
is very poor since a software for digital typesetting should not use a contrived representation for an arrow. Either it should use the Unicode symbol “→” or any other character representing association. Oh, this was fixed in the article “Revised definition of commands/environments”. -
“Environment names can consist of any characters except
{
,}
or*
”. The inclusion of control characters and line breaks as part of the environment names seems awkward. Furthermore users can be easily misguided due to the lack of Unicode normalization. Oh, this was fixed in the article “Revised definition of commands/environments”.
“Page references”
The representation of information across different media with different features is indeed crucial. Specifically printing the URL in parentheses after the text is a popular approach, but distracting for longer URLs. Another approach is defining a footnote. Any more sophisticated adjustments of text (“at doc.rust-lang.org” versus “on page 32”) requires advanced Natural Language Processing for a multitude of languages.
“A current status on finl”
-
Regarding the roadmap, the focus on gftopdf seems to indicate that Don is focused on the Metafont stack. This neglects all the advances in the field of modern typography such as the OpenType standard. I am also curious about the details of
finl-math
. The notation for mathematical typesetting and its integration is crucial and I wonder how it is meant to work with MathML stacks. -
“There will be non-Turing complete macros available to document authors, but any more sophisticated handling will be done through extensions written in Python.” Oh, that’s new.
“Building a trie in Rust—optimizations”
Thank you for the interesting article!
“Revised definition of commands/environments”
-
“A single non-letter is defined to include symbols which may be represented by multiple code points but a single output grapheme”. I think this matches the notion of grapheme cluster.
-
The parameter formats seem complex, but sticking to xparse seems like a viable approach
“Time spent digging through unicode categories”
Some of the basic stuff, what I’m using in finl, is ok: letters are correctly identified and classed and marks are mostly ok (although it’s not clear why in some scripts derived from Brahmi, vowels are treated as spacing marks and in others they’re letters). Perhaps if I knew a bit more about the alphabets I could make sense of them.
Indeed, typesetting always needs to know which typing system and language is currently typeset.
A brief look at the crates
finl_charsub
The idea of finl-charsub 1.0 is to provide an API which allows to apply character substitution efficiently.
use finl_charsub::charsub::CharSubMachine;
fn main() {
let mut char_sub_machine = CharSubMachine::new();
char_sub_machine.add_substitution("incomplete", "X");
assert_eq!("This is ", char_sub_machine.process("This is incomplet"));
assert_eq!("incomplet", char_sub_machine.flush());
}
I think the API is neat even though it maintains state in the output routine (hence the flush
instruction). I also appreciate its proper error handling. It could be generalized to non-UTF-8 text. It is awkward though to print arbitrary debug information to stdout as part of a library.
finl_unicode
finl_unicode
1.1 provides functionalities for character categories straight from the Unicode standard as well as some grapheme clustering algorithm:
use finl_unicode::categories::CharacterCategories;
use finl_unicode::grapheme_clusters::Graphemes;
fn main() {
println!("is currency? {}", '€'.is_symbol_currency());
for grapheme in Graphemes::new("👪") {
println!("grapheme: '{}'", grapheme);
}
}
I appreciate the benchmark results provided and the readability of the source code. Assuming everything is functionally correct, this implementation is very handy.
Conclusion
Remarks on other blog posts
On “Stupid Mac tricks”: This is a trivial feature considering the kernel (due to support of inotify/FSEvents) recognizes which paths change and which file descriptors are open.
On “First impressions on Rust”: The book “The Rust Programming Language” has a very mature state. I am happy to see that it helped Don.
Conclusion on finl
finl seems to strive for a set of desirable properties, but neglects some important fields. If one designs a new markup language… what about syntax highlighting? What about language servers support? What about integration with other data serialization formats like XML, TOML, CSV, and YAML? If fonts are supposed to be supported… which formats like WOFF/TrueType/OpenType will be supported? How does one close the gap between print (yeah sure, the font specification can be arbitrary and needs to be compiled into a representation PDF is happy with) and web (OpenType or WOFF it is)? How does one influence the shaping algorithm to substitute certain sequences or characters because the available font is known to produce bad results? What is the box model or in general how are document element layouts specified? What are the interlanguage approaches towards topics like hyphenation and line breaking?
I appreciate the efforts by Don, but digital typesetting is not an easy field. Many different standards come together and software needs to integrate neatly with existing software stacks. It took Don Knuth 10 years and he defined many standards on his own because of a lack of existing ones. These days the required effort is much higher.
In short, I don’t think finl
will play any role within the next 10 years of digital typesetting, because the design is premature and progress slow.