String escaping strategies

✍️ → Written on 2019-09-18 in 222 words. Part of cs software-development

As formal grammar designer, you often need to consider escaping. Consider a markup language. You want to introduce some source code block which allows you to specify arbitrary content. The question is how can you specify, that some source code block ends? Let us look at two examples:

Markdown allows to specify source code blocks the following way:

```rust
fn main() {
    println!("Hello World!")
}
```

Asciidoc[tor] allows to specify source code blocks the following way:

[source,rust]
----
fn main() {
    println!("Hello World!")
}
----

Apparently ` was the syntax to escape strings in case of Markdown und ---- was used to stop Asciidoc[tor] source blocks. I want to consider all approaches generically:

  • cs is a control sequence.

Escaping requires that special character must be denoted differently: \cs However, "\cs" must be escapable again: \cs But there are different interpretations: \cs - escaped backslash, control sequence "cs" \cs - 2 escaped backslashes

<html> is a control sequence. < and > has to be escaped: <html> But <html> should suffice in terms of escaping? It is implicitly clear that > must also be escaped.