<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://lukas-prokop.at/feed.xml" rel="self" type="application/atom+xml" /><link href="https://lukas-prokop.at/" rel="alternate" type="text/html" /><updated>2026-01-01T17:27:32+01:00</updated><id>https://lukas-prokop.at/feed.xml</id><title type="html">Lukas’ weblog</title><subtitle>My weblog about life, languages, culture and technology</subtitle><author><name>Lukas Prokop</name></author><entry xml:lang="en"><title type="html">Some ways to approach focus in life</title><link href="https://lukas-prokop.at/articles/2025-12-31-how-to-approach-focus-in-life" rel="alternate" type="text/html" title="Some ways to approach focus in life" /><published>2025-12-31T15:00:00+01:00</published><updated>2026-01-01T17:27:22+01:00</updated><id>https://lukas-prokop.at/articles/2025-12-31-how-to-approach-focus-in-life</id><content type="html" xml:base="https://lukas-prokop.at/articles/2025-12-31-how-to-approach-focus-in-life"><![CDATA[<div class="sect1">
<h2 id="motivation">Motivation</h2>
<div class="sectionbody">
<div class="paragraph">
<p>It is a repeating pattern for me, I am craving for: In some quiet minutes after work days, I am sufficiently good at getting projects done. Once leisure time (holidays / vacation) reaches me, I am craving for the moment to sit down, revisit the current state of affairs and plan ahead what the next steps are. I like that moment. It needs a lot of mental freedom, but I like it so much, that I make sure to get such a moment from time to time.</p>
</div>
<div class="paragraph">
<p>The most high-level question one can answer in this moment is “What do I want to focus on in my life?”. I am lucky enough, that I am not in existential mode all the time. At Aikido in Graz recently, I met a person coming from <a href="https://en.wikipedia.org/wiki/Odesa">Odesa</a> to Graz one month prior. We had a longer chat and that person obviously only had one focus in mind: ‘SURVIVE’. I am fortunate enough to be able to focus on other topics as well like digital typesetting or languages.</p>
</div>
<div class="paragraph">
<p>In this post, I want to present three frameworks which can help to identify what one wants to focus on.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="framework-1-maslowss-pyramid-of-needs">Framework 1: Maslows&#8217;s pyramid of needs</h2>
<div class="sectionbody">
<div class="paragraph">
<p>The categories are taken over from the <a href="https://en.wikipedia.org/w/index.php?title=Maslow%27s_hierarchy_of_needs&amp;oldid=1327776047">famous pyramid</a> whereas the notes next to it, are my personal ones. It is my personal goal to have some projects in each category. If I have many projects in the category ‘Esteem’, but few projects in ‘Physiological Needs’, I am lowering the priority of ‘Esteem’ projects and increasing the priority of ‘Physiological Needs’ projects.</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p><em>Self-Actualisation (most sophisticated need)</em>: get bored to initiate creativity intentionally; implement new approaches/ideas; defining goals for your next 1/5/10 years in life; spread profound knowledge you gained by long-term efforts</p>
</li>
<li>
<p><em>Esteem</em>: interact/discuss matters with loved ones to reinforce ideas/concepts and dismantle prejudices; do something consciously which was once difficult in your life, but is now trivial; give talks; engage in a podcast episode; implement a difficult project having a good strategy at your avail</p>
</li>
<li>
<p><em>Love &amp; Belonging</em>: plan ahead how/when you can meet friends/family to prevent neglect; strengthen your romantic relationships; organize communal events; invite someone over; talk to the neighbors; give tiny gifts or small-talk to people you know but you are not familiar with</p>
</li>
<li>
<p><em>Safety needs</em>: physical exercise; think through fallback options, if personal belongings get lost; discuss insurance options; declare assuring thresholds for income versus expenses limits; invest into physical or digital security</p>
</li>
<li>
<p><em>Physiological needs</em>: plan ahead food supplies; have emergency food in your storage; take care of your household and workplace; buy new clothes <em>before</em> the old ones are falling apart; get enough sleep</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>I am pretty sure my notes are a bit of a stretch. Some might be miscategorized w.r.t Maslow and many may not apply to your idea of ‘projects’. But it should give you an idea of my approach: look at each category, write down notes, and attach my (mostly existing) projects to the notes. After this process, I might recognize that my current projects are egocentric. Or that I could share my new knowledge in a certain talk.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="framework-2-time-dimension">Framework 2: time dimension</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Another approach can be to have projects in all of these three categories:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p><em>past</em>: finish an old project which still makes sense, but had too little value to be finished; revisit old responsibilities which are usually self-initiated but should be done (→ volunteering work); renew friendships; summarize past experiences</p>
</li>
<li>
<p><em>present</em>: finish work-in-progress projects; answer inquiries; review pull requests</p>
</li>
<li>
<p><em>future</em>: plan ahead schedules; think through what would improve your life; consider in which way this new technology could improve your life</p>
</li>
</ol>
</div>
</div>
</div>
<div class="sect1">
<h2 id="framework-3-four-necessities">Framework 3: four necessities</h2>
<div class="sectionbody">
<div class="paragraph">
<p>This is a list of categories which I came up with personally. It is a list which makes it easier for me to achieve a uniform distribution of projects (e.g. 10 current projects in each category).</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p><em>mental</em>: walks in nature; listening to thoughts of other people (face-to-face or in podcasts); implement creativity projects; expand knowledge; learn new skills</p>
</li>
<li>
<p><em>social</em>: host friends; engage in local clubs &amp; events; discuss life with people from other cultures; understand how society works on a local/regional/national/international level; build relationships &amp; friendships</p>
</li>
<li>
<p><em>physical</em>: stretching exercises; exercises for strength; running; bicycling; ball games; martial arts</p>
</li>
<li>
<p><em>representational</em>: report project status; publish your knowledge; micro-blog; write blog posts</p>
</li>
</ol>
</div>
</div>
</div>
<div class="sect1">
<h2 id="conclusion">Conclusion</h2>
<div class="sectionbody">
<div class="paragraph">
<p>I am fully aware that my blog post has many assumptions. I have a very broad definition of project (“any TODO which takes more than 1 day of work”) and I consider it in a flat structure (umbrella projects and subprojects are treated alike). It does not help many people to ‘even out’ projects in certain categories. But if this blogpost helped you to regain a high-level perspective what you are focusing on in life, this blogpost was worth it writing it down.</p>
</div>
</div>
</div>]]></content><author><name>Lukas Prokop</name></author><category term="life" /><category term="reflection" /><summary type="html"><![CDATA[Motivation]]></summary></entry><entry xml:lang="en"><title type="html">U([0, 1)) → U([0, 1])</title><link href="https://lukas-prokop.at/articles/2025-12-28-mapping-same-measure-sets" rel="alternate" type="text/html" title="U([0, 1)) → U([0, 1])" /><published>2025-12-28T10:00:00+01:00</published><updated>2026-01-01T17:10:36+01:00</updated><id>https://lukas-prokop.at/articles/2025-12-28-mapping-same-measure-sets</id><content type="html" xml:base="https://lukas-prokop.at/articles/2025-12-28-mapping-same-measure-sets"><![CDATA[<div class="sect1">
<h2 id="motivation">Motivation</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Given a sampler for a uniform distribution in [0,1), can one attain a sampler for a uniform distribuition in [0,1] (i.e. the value 1 is possible as well)?</p>
</div>
<div class="paragraph">
<p>First, a disclaimer: the question itself is ridiculous and a purely mathematical thought experiment. I also don&#8217;t have an solution here. But it was fun to think it through. It is ridiculous, because either you need it in practice on a computer. Then you need to consider the realities of IEEE 754 with its incapability to represent a continuous interval. Then you need completely different approaches than represented here. Or you consider the mathematically continuous range [0,1) in ℝ. In this case, measure theory tells you that the measures of [0,1) and [0,1] are equal and thus the question becomes boring in all mathematical contexts.</p>
</div>
<div class="paragraph">
<p>Ok, so we can sample values from a uniform distribution U([0, 1)) and need an algorithm with returns one value more with the same probability.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="first-idea">First idea</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Let \(s ∈ U([0,1))\).</p>
</div>
<div class="stemblock">
<div class="content">
\[ u(s) := \frac{1}{1-s} \]
</div>
</div>
<div class="paragraph">
<p>So with \(f(x) = 1-x\) I can map \(U([0,1))\) to \(U((0,1\))] (sorry, the markup language parser screws up the braces). Function \(f(x) = 1/x\) is fun, because it gets arbitrarily close to 0, but does not reach it. The argument 0 itself is disallowed in ℝ (c.f. <a href="https://en.wikipedia.org/wiki/Partial_function">partial function</a>). We use this mechanism in the way that 1 is never returned by the sampler \(U([0,1))\). However, the idea fails because it completely distorts the uniform probability and value 0 cannot be attained.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="second-idea">Second idea</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Let \(s ∈ U([0,1))\) and \(t ∈ U([0,1))\).</p>
</div>
<div class="stemblock">
<div class="content">
\[ u(s, t) := \frac{s+(1-t)}{2} \]
</div>
</div>
<div class="paragraph">
<p>Now the idea is to sample twice. We use one value in [0,1) (namely \(s\)) and one value in (0,1] (namely \(1-t\)). Since we now apply addition, we don&#8217;t distort the probability. Division by 2 just maps the sum from interval (0, 2) to (0, 1). Oh right, we already know the reason why this result is invalid: values 0 and 1 are excluded.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="third-idea">Third idea</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Now I started integrating the idea of rejection sampling. Draw a sample and test it. Check it, potentially reject it and do something else.</p>
</div>
<div class="paragraph">
<p>Let \(s ∈ U([0,1))\) and \(t ∈ U([0,1))\).</p>
</div>
<div class="stemblock">
<div class="content">
\[ u(s, t) := \begin{cases}
  s      &amp; \text{if } s &lt; 0.5 \\
  1-t    &amp; \text{else}
\end{cases} \]
</div>
</div>
<div class="paragraph">
<p>Due to the case distinction, we either pick a value in [0, 0.5) or (0, 1]. With set union, this makes [0, 1] which is our desired interval. However, did we preserve the properties of a uniform distribution? It immediately looks suspicious if the intervals of the two cases overlap, because this distorts the probability. Value 0.3 might be returned due to case 1 or 2 whereas value 0.7 can only result from case 2. Hence, the values in (0, 0.5) are twice as likely as the values [0.5, 1].</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="fourth-idea">Fourth idea</h2>
<div class="sectionbody">
<div class="paragraph">
<p>So let us align the intervals.</p>
</div>
<div class="paragraph">
<p>Let \(s ∈ U([0,1))\) and \(t ∈ U([0,1))\).</p>
</div>
<div class="stemblock">
<div class="content">
\[ u(s, t) := \begin{cases}
  s              &amp; \text{if } s &lt; 0.5 \\
  1-\frac{t}{2}  &amp; \text{else}
\end{cases} \]
</div>
</div>
<div class="paragraph">
<p>Now the two cases cover the intervals [0, 0.5) and (0.5, 1]. So we miss value 0.5 … damn.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="fifth-idea">Fifth idea</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Does it help if we make the condition independent of the returned value?</p>
</div>
<div class="paragraph">
<p>Let \(r, s, t ∈ U([0,1))\).</p>
</div>
<div class="stemblock">
<div class="content">
\[ u(r, s, t) := \begin{cases}
  s      &amp; \text{if } r &lt; 0.5 \\
  1-t    &amp; \text{else}
\end{cases} \]
</div>
</div>
<div class="paragraph">
<p>The two cases cover the intervals [0, 1) and (0, 1]. Its set union is [0, 1] which covers our desired interval. However, value 0 can only be created if the first case applies whereas 1 can only be reach through the second case. All other values can be returned from either case. So values {0, 1} are less likely than other values.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="fifth-idea-2">Fifth idea</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Ok, can we introduce a special handling for the one value we want to extend upon?</p>
</div>
<div class="paragraph">
<p>Let \(s, t ∈ U([0,1))\).</p>
</div>
<div class="stemblock">
<div class="content">
\[ u(s, t) := \begin{cases}
  \begin{cases}
    0       &amp; \text{if } t &lt; 0.5 \\
    1       &amp; \text{else}
  \end{cases} &amp; \text{if } s = 0 \\
  s         &amp; \text{else}
\end{cases} \]
</div>
</div>
<div class="paragraph">
<p>If we start to create two return values for one value, we obviously split the probability into two. Thus we don&#8217;t create a uniform distribution, because values {0, 1} are half as likely.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="sixth-idea">Sixth idea</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Ok, let us get back to previous ideas. If we don&#8217;t use less-than but less-than-or-equal in the condition, we can get a closed interval. This way, we can design our desired interval.<br>
Now, I need to use a loop to express the process:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>forever</p>
<div class="olist loweralpha">
<ol class="loweralpha" type="a">
<li>
<p>Let \(s, t ∈ U([0,1))\).</p>
</li>
<li>
<p>if \(s\) ≤ 0.5, then return \(s\)</p>
</li>
<li>
<p>if \(t\) ≤ 0.5, then return \(1-t\)</p>
</li>
</ol>
</div>
</li>
</ol>
</div>
<div class="paragraph">
<p>So we either return a value from the interval [0, 0.5] or from [0.5, 1]. Because we neglect the cases when the condition is not true, it is difficult for me to quantify the probability distribution in this case. I think the fact that value 0.5 occurs in both cases makes it more likely than other values. However, I think this approach is also bad, because we have no guarantee of termination. We cannot guarantee that we always get a sampled value.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="conclusion">Conclusion</h2>
<div class="sectionbody">
<div class="paragraph">
<p>I looked at different ideas to extend U([0,1)) to U([0,1]), but failed at finding a mathematically beautiful solution. I am neither a probability theorist nor did any literature research. I wrote down the question several years ago during my studies as a note and discussed it with work colleagues recently. The brainstorming lead to what I wrote into the disclaimer. Finally, I think combining a function and its inverse (functions \(10^x\) and \(\log_{10}(x)\)) might be a viable approach, but I lacked creativity to combine them usefully. Facts like \(1/x \neq 0\), \(10^x \neq 0\), \(\log_{10}(x) \neq 0\) and \(10^0 = 1\) can include/exclude the missing value. But all of it is just food for thought.</p>
</div>
</div>
</div>]]></content><author><name>Lukas Prokop</name></author><category term="math" /><category term="probability-theory" /><summary type="html"><![CDATA[Motivation]]></summary></entry><entry xml:lang="eo"><title type="html">Raporto de la eŭropa tago de lingvoj</title><link href="https://lukas-prokop.at/articles/2025-10-11-raporto-de-la-e%C5%ADropa-tago-de-lingvoj" rel="alternate" type="text/html" title="Raporto de la eŭropa tago de lingvoj" /><published>2025-10-11T00:00:00+02:00</published><updated>2025-10-19T12:58:36+02:00</updated><id>https://lukas-prokop.at/articles/2025-10-11-raporto-de-la-e%C5%ADropa-tago-de-lingvoj</id><content type="html" xml:base="https://lukas-prokop.at/articles/2025-10-11-raporto-de-la-e%C5%ADropa-tago-de-lingvoj"><![CDATA[<div class="paragraph">
<p><strong>Ĝisdatigo 2025-10-19:</strong> Amiko sendis korektadon al mi pro miaj eraroj. Mi ĝisdatigis ĉi tiun artikolon.</p>
</div>
<div class="sect1">
<h2 id="enkonduko">Enkonduko</h2>
<div class="sectionbody">
<div class="paragraph">
<p>La Esperanta klubo de Graz partoprenis en la eŭropa tago de lingvoj.
Mi ĉeestis kaj helpis ĉe la stando antaŭ la ŝtuparo de Schlossberg.
Amiko demandis artikolon pri la sperto. Voilá!</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="eŭropa-tago">Eŭropa tago</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Je la 26-a de septembro, la eŭropa tago de lingvoj okazis je la 25-a fojo. Ĉi tiu tago festas la diversecon de eŭropaj lingvoj kaj konektas homojn kiuj interesiĝas pri aliaj lingvoj. Graz estas la 2-a plej granda urbo de Aŭstrio kaj havas multajn lingvajn influiĝojn. Slovenio estas sufiĉe proksima kaj multaj slavlingvaj parolantoj loĝas en Graz aŭ vizitas la urbon. Lernantoj kutime elektas inter latinidaj lingvoj kiel la itala, la hispana, la franca, aŭ la latina mem. Plie Graz havas multajn universitatojn. Eksterlandaj studentoj studas en Graz por unu aŭ pluraj semestroj kaj oni povas havi bonan sperton lerni pri aliaj kulturoj malproksime de Aŭstrio. Mi mem interagis kun homoj de Kolombio, Sud-afriko, Ĉinio, kaj Japanio. Konklude, Graz havas realan sperton pri la temo kaj la festo ekokazis je la 9-a horo en la urbocentro ĉe Schlossbergplatz.</p>
</div>
<div class="imageblock center">
<div class="content">
<img src="../assets/img/2025-10-11_tendo.jpg" alt="Tendoj de la eŭropa tago de lingvoj je la 26-a de septembro 2025" width="50%">
</div>
</div>
<div class="paragraph">
<p>Dum la tago multaj prezentaĵoj rilate al lingvoj okazis. Je 10:45, Muhammed Dumanli prezentis siajn Poetry-Slam-tekstojn fokusitajn al kulturaj aspektoj. Je 13:15, dancogrupo organizis komunan ‘linian’ dancon. Ĉiu partoprenanto rajtis danci laŭ la instruo de organizantoj kaj la usonana kontrea muziko. Je 14:30, oni povis aŭskulti anglan teatraĵon. Je 15:45, Simon Ošlak-Gerasimov legis sian tekston pri la slovena aspekto de Graz. Memoru ke Graz baziĝas sur la slovena vorto ‘gradec’ esperante ‘malgranda kastelo’.</p>
</div>
<div class="imageblock center">
<div class="content">
<img src="../assets/img/2025-10-11_afiŝo.jpg" alt="Afiŝo kun programeroj de tago de lingvoj" width="50%">
</div>
</div>
<div class="paragraph">
<p>La organizantoj de la festa tago preparis tri centrojn en la urbo. Unue, la proksima museo de Graz ofertis komunan urbopromenon pri lingvoj. Ili ankaŭ ofertis prezentaĵon pri la 30-jara membreco de Aŭstrio en la Eŭropa Unio. Due, multaj institucioj (p.e. urba biblioteko, pedagogia universitato de Stirio) ofertis specialan programon pri lingvoj en siaj domoj. Trie, oni starigis tri tendojn ĉe Schlossbergplatz. Diversaj grupoj havis standojn en ĉi tiuj tendoj. ‘Institut culturel franco-autrichien’ klopodas pri pli bona rilato inter Francio kaj Aŭstrio. Ili ofertas franclingvajn kursojn. Aparte la ‘Integrationsreferat der Stadt Graz’ (urba institucio por integrado) kaj la Eŭropa Komisiono en Aŭstrio prezentis sin. ‘deutsch in graz’ ofertas germanlingvajn kursojn por eksterlandanoj kaj prezentis ilin. Inter tiuj grupoj, ses esperantistoj subtenis la Esperantan standon. Esperanto havis plenumitan tablon de libroj kaj lingvajk afiŝoj.</p>
</div>
<div class="imageblock center">
<div class="content">
<img src="../assets/img/2025-10-11_festo.jpg" alt="Tendo kun Esperanta grupo apud sia stando" width="50%">
</div>
</div>
<div class="paragraph">
<p>La interago kun interesuloj tre plaĉis al mi. Se oni mencias Esperanton, diversaj kategorioj de respondoj ekzistas. Viro ne konis Esperanton. Familioj kutime tuj klopodas instrui infanojn pri la elparolado de bazaj vortoj. Virino tuj respondis ‘estis bona ideo, ĉu ne? Sed ne funkciis, ĉu ne?’. Kiam ni interagis kun homoj de Indonezio, ni serĉis komunajn ecojn de niaj lingvoj.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="konkludo">Konkludo</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Estis amuza tago por mi kaj taŭgis ke ni reprezentis Esperanton publike en ĉi tiu evento.</p>
</div>
</div>
</div>]]></content><author><name>Lukas Prokop</name></author><category term="life" /><category term="Austria" /><category term="languages" /><category term="Esperanto" /><summary type="html"><![CDATA[Ĝisdatigo 2025-10-19: Amiko sendis korektadon al mi pro miaj eraroj. Mi ĝisdatigis ĉi tiun artikolon.]]></summary></entry><entry xml:lang="en"><title type="html">Review: Vienna Coffee Festival 2025</title><link href="https://lukas-prokop.at/articles/2025-09-13-vienna-coffee-festival" rel="alternate" type="text/html" title="Review: Vienna Coffee Festival 2025" /><published>2025-09-13T20:00:00+02:00</published><updated>2025-11-19T23:55:30+01:00</updated><id>https://lukas-prokop.at/articles/2025-09-13-vienna-coffee-festival</id><content type="html" xml:base="https://lukas-prokop.at/articles/2025-09-13-vienna-coffee-festival"><![CDATA[<div class="sect1">
<h2 id="motivation">Motivation</h2>
<div class="sectionbody">
<div class="paragraph">
<p>A friend of mine invited me to attend the Vienna Coffee Festival together. So we did for one day. I wanted to write a summary, what I experienced and learned.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="the-event">The event</h2>
<div class="sectionbody">
<div class="paragraph">
<p>The <a href="https://www.viennacoffeefestival.cc/">Vienna Coffee Festival</a> (edition 2025) takes place from Friday (yesterday) to Sunday (tomorrow). The entrance poster mentions “11 years”. So I assume it takes place the eleventh time. It is an event for coffee lovers where coffee brewers, coffee machine producers, cup and mug producers, coffee roasters, coffee baristas, and latte artists come together to share thoughts, ideas, and let people experience different kinds of coffee. We got “Super early bird tickets” in April and paid 13€ whereas the ticket on-site costs 25€. Upon entry, you get a mug for 1€ which is paid back when you leave and bring back the mug. The core idea is that you visit various booths and they brew coffee for you; for free. So you can taste various flavors and roasts. However, there are also booths where people present talks about the economic side of coffee, explain the differences in cultures, or just play music. There was also the Austrian competition for baristas going on during our attendance.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<img src="/assets/img/2025-09-13_vienna-coffee-festival_coffee-mug.jpg" alt="I am holding a coffee mug in front of me with a classic heart in latte art style" width="60%">
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="the-venue">The venue</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Karl-Marx hall is used as venue for the festival. In the front, your ticket is checked and you can get your refund mug. Many coffee brewing booths come first and some talk- or music-related booths can be found in between. Not the entire hall is filled. The final third cannot be reached and the rear part is used for food stands. We got some Indian vegan lunch, but they also offer Thai or more Austrian/Czech style cuisine. Water supply and toilets are available in the north and well-maintained.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<img src="/assets/img/2025-09-13_vienna-coffee-festival_venue.jpg" alt="A large hall serves a venue of the festival with many people standing at booths" width="80%">
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="what-i-learned">What I learned</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Markus from <a href="https://gota.coffee/">gota coffee</a> was so kind to present us a lot of details about coffee making. He showed us the <a href="https://www.ikawacoffee.com/">ikawa coffee roasting machine</a>. This machine allows you to define a profile, which defines the temperature curve the machine is following during the roasting process. Curves are shared between roasting aficionada on their website. Markus points out that the level of uniformity in the roasting process achieved by the machine is pretty good which makes results more predictable. The issue with coffee roasting is that commonly you get coffee beans in sacks of 5 kg. Thus it is not easy to try many different coffee beans with the same curves on a hobby level.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<img src="/assets/img/2025-09-13_vienna-coffee-festival_roasting-machine-profile.jpg" alt="A man is holding up his smartphone showing some curve on the screen and some Ikawa machine and some roasted and fresh coffee beans are in the front" width="60%">
</div>
</div>
<div class="paragraph">
<p>By the way, <a href="https://www.myhomeroast.com/">MyHomeRoast</a> was also present at the festival. A sales woman introduced us to their approach. It seemed like MyHomeRoast has limited configurability (compared to ikawa), but they also seem to work in different price ranges. Whereas MyHomeRoast machines cost about 700€, the prices of ikawa machines are not even listed on their website.</p>
</div>
<div class="paragraph">
<p>Katharina from gota coffee made us some cascara tea. This is some herbal tea made from the dried skins and pulp of coffee cherries. She mentioned that one uses about 4g of cascara for 100ml hot water. Cascara was considered a waste product in Europe, but tea was made at plantation sites since the early days. It contains a small amount of coffee and we loved it. <a href="https://gota.coffee/blogs/news/austrian-aeropress-championship-2024-a-memorable-day-on-the-danube">Katharina also introduced us to AeroPress</a>, an <a href="https://gota.coffee/collections/equipment/AeroPress">Espresso equipment</a> which uses air and human strength to build up the pressure.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="competitions">Competitions</h2>
<div class="sectionbody">
<div class="paragraph">
<p>On Saturday, there were three people competing in “Latte Art” in the morning and five people competing in “Barista” in the afternoon. It is not easy to present your coffee (origin, date of roasting, flavor characteristics, …) and brewing approach (temperature, duration) while actually making the coffee in time. Each participant seemed to finish off explaining their signature drink. We observed a recognizable difference between first-time participants and experienced competitors. The jury watched each step, but recognizably only one judge was allowed to be in the vicinity of the examinee. It felt like the style of presentation, hygiene, timing, and taste were taken into account.</p>
</div>
<div class="imageblock text-center">
<div class="content">
<img src="/assets/img/2025-09-13_vienna-coffee-festival_jury.jpg" alt="A woman is presenting her barista skills in front of a four-person jury" width="80%">
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="conclusion">Conclusion</h2>
<div class="sectionbody">
<div class="paragraph">
<p>What a fun experience. It seems to be a growing community and all people have been kind to us. I am a coffee drinker and made coffee with different equipments, but what I do, is far from the professional level. It was nice to learn some stuff from a different field.</p>
</div>
</div>
</div>]]></content><author><name>Lukas Prokop</name></author><category term="life" /><category term="Austria" /><category term="reflection" /><summary type="html"><![CDATA[Motivation]]></summary></entry><entry xml:lang="en"><title type="html">A card game, I learned during high school</title><link href="https://lukas-prokop.at/articles/2025-08-17-cardgame" rel="alternate" type="text/html" title="A card game, I learned during high school" /><published>2025-08-17T20:00:00+02:00</published><updated>2025-08-17T22:28:28+02:00</updated><id>https://lukas-prokop.at/articles/2025-08-17-cardgame</id><content type="html" xml:base="https://lukas-prokop.at/articles/2025-08-17-cardgame"><![CDATA[<div class="sect1">
<h2 id="motivation">Motivation</h2>
<div class="sectionbody">
<div class="paragraph">
<p>At high school, we got bored at 10-minute breaks between classes. So my friends taught me a card game. I remember it thoroughly, but I cannot tell its origin. Technically, we called it “stress”, but looking at <a href="https://de.wikipedia.org/wiki/Stress_(Kartenspiel)">the German Wikipedia article</a>, I only recognize different rules.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="the-rules">The rules</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="distribution-of-cards">Distribution of cards</h3>
<div class="ulist">
<ul>
<li>
<p>Each player gets 4 cards to be placed face-down in front.</p>
</li>
<li>
<p>Each player gets 4 cards to be placed on top of face-down cards face-up.<br></p>
</li>
<li>
<p>Each player gets 4 cards to be held in hands and only the player himself/herself may see the card.</p>
</li>
<li>
<p>The remaining cards are put into a stack in the center of all players face-down (<em>drawing stack</em>).</p>
</li>
<li>
<p>One card is removed from the drawing stack and placed face-up. This card initiates the <em>game stack</em>.</p>
</li>
<li>
<p>The card cementery is the final stack. Once a card is moved there, it cannot be played anymore.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Each player is allowed to exchange cards from one&#8217;s hand with the cards placed face-up. The strategic goal is to have your most valuable cards as face-up cards to your avail. What signifies “valuable” follows from the next section.</p>
</div>
</div>
<div class="sect2">
<h3 id="the-value-of-a-card">The value of a card</h3>
<div class="paragraph">
<p>Players take turns counter-clockwise.</p>
</div>
<div class="paragraph">
<p>They have to play a card to <em>react</em> to the top card on the game stack. Suits do not matter. In order to react, one must place one or more cards on top of the game stack; commonly a card of same or higher value (the order from lowest to highest is <code>2 3 4 5 6 7 8 9 10 J Q K A</code>).</p>
</div>
<div class="ulist">
<ul>
<li>
<p>If and only if you have several cards of the same value, you may place them on the game stack in one turn.</p>
</li>
<li>
<p>However, some cards have special semantics:<br></p>
<div class="dlist">
<dl>
<dt class="hdlist1">2</dt>
<dd>
<p>may be played at any point in time, a card of any value may be placed on top.</p>
</dd>
<dt class="hdlist1">3</dt>
<dd>
<p>may be played at any point in time, this card is ‘invisible’ and thus the value of the top card below matters (if there is none, it acts like a ‘2’).</p>
</dd>
<dt class="hdlist1">8</dt>
<dd>
<p>the next player has to skip.</p>
</dd>
<dt class="hdlist1">9</dt>
<dd>
<p>the next card to be played must have a value less-than nine.</p>
</dd>
<dt class="hdlist1">10</dt>
<dd>
<p>the entire game stack is moved to the cemetery. The same player starts a new game stack with any card.</p>
</dd>
</dl>
</div>
</li>
<li>
<p>If you fail to provide a card, you have to pick up the entire game stack. All those cards now constitute your cards in your hands to react with (recognize: having many cards available, can provide an advantage in your strategy to react as well). The next player continues and begins a new game stack with any card.</p>
</li>
</ul>
</div>
<div class="imageblock center">
<div class="content">
<img src="../assets/img/2025-08-11_cardgame.jpg" alt="Cardgame scenario" width="70%">
</div>
<div class="title">Figure 1. A game situation in the card game. I have some cards in my hand, four cards are placed face-up, and face-down each. I have many high cards (Ace) or special cards (8, 10) in my two decks which might make me lucky. But my opponent has some special cards (3, 9) in the face-up deck as well. The game stack is visible in the middle, and the drawing stack is barely visible on the far-right. The cemetery is not visible.</div>
</div>
</div>
<div class="sect2">
<h3 id="who-wins-the-game">Who wins the game</h3>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>First, you may only play the cards in your hands. If you have less than 4 cards, you need to draw one card from the drawing stack as long as some are available.</p>
</li>
<li>
<p>If you have no “cards in your hands” left anymore, you have to play with the four faced-up cards in front of you. Recognize: it is an advantage for other players to know your values now. They can slow down your progress with the last remaining cards.</p>
</li>
<li>
<p>If you have no “faced-up cards” left anymore, you have to play with the four faced-down cards now. Since no-one can see the value of the cards (not even the selecting player), the player picks an arbitrary one and hopes to provide an admissible card. Even if you fail four times, each turn one faced-down card was removed and thus the selecting player makes progress.</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>If all cards in one&#8217;s hands, faced-up, and faced-down have been placed admissibly on the game stack, the player has finished. The player which finished the fastest, has won.</p>
</div>
</div>
</div>
</div>]]></content><author><name>Lukas Prokop</name></author><category term="cs" /><category term="software-development" /><summary type="html"><![CDATA[Motivation]]></summary></entry><entry xml:lang="en"><title type="html">Guidelines for the design of file formats</title><link href="https://lukas-prokop.at/articles/2025-08-05-guidelines-for-design-of-fileformats" rel="alternate" type="text/html" title="Guidelines for the design of file formats" /><published>2025-08-05T18:00:00+02:00</published><updated>2025-11-19T23:56:51+01:00</updated><id>https://lukas-prokop.at/articles/2025-08-05-guidelines-for-design-of-fileformats</id><content type="html" xml:base="https://lukas-prokop.at/articles/2025-08-05-guidelines-for-design-of-fileformats"><![CDATA[<div class="paragraph">
<p><strong>Update 2025-10-19:</strong> I added edn as mentioned file format.</p>
</div>
<div class="sect1">
<h2 id="motivation">Motivation</h2>
<div class="sectionbody">
<div class="paragraph">
<p>I don&#8217;t think I am an expert on this topic, but I feel like there are some simple guidelines which don&#8217;t get retold often enough. I designed some file formats myself and have some frustrating experiences with recurring file format definition errors. Academically, I think there should be more interest in this topic instead of the common “I parsed PDF files using machine learning” papers. Anyhow, I hope some academics find some more answers to open questions, but for now let us summarize those guidelines.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="prior-art">Prior art</h2>
<div class="sectionbody">
<div class="paragraph">
<p>I listed the entries by my own rating of “contributes to the topic at hand” from most to least significant:</p>
</div>
<div class="ulist bibliography">
<ul class="bibliography">
<li>
<p><a id="paperDSL"></a>[paperDSL] <a href="http://arxiv.org/abs/1409.2378">paper “Design Guidelines for Domain Specific Languages”</a> (2014) by Karsai, Krahn, Pinkernell, Rumpe, Schindler, and Völkel</p>
</li>
<li>
<p><a id="talk38c3"></a>[talk38c3] <a href="https://youtu.be/zUVYvE9tLBM?si=7ocAZMJrDK02DvBT&amp;t=2278">talk “38C3 - Fearsome File Formats”</a> (2024-12-30) by Ange Albertini</p>
</li>
<li>
<p><a id="articleKOMPPA"></a>[articleKOMPPA] <a href="https://solhsa.com/oldernews2025.html#ON-FILE-FORMATS">article “On File Formats”</a> (2025-05-19) by Jari Komppa</p>
</li>
<li>
<p><a id="paperLITTLE"></a>[paperLITTLE] <a href="https://doi.org/10.1145/6424.31569">paper “Little Languages”</a> (1986) by Jon Bentley</p>
</li>
</ul>
</div>
</div>
</div>
<div class="sect1">
<h2 id="paper-design-guidelines-for-domain-specific-languages">Paper “Design Guidelines for Domain Specific Languages”</h2>
<div class="sectionbody">
<div class="paragraph">
<p>This paper from 2014 <a href="#paperDSL">[paperDSL]</a> lists 26 guidelines in various five categories. Without discussing them in detail, I am going to list their names here:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Language Purpose</p>
<div class="ulist">
<ul>
<li>
<p><em>Guideline 1:</em> Identify language uses early</p>
</li>
<li>
<p><em>Guideline 2:</em> Ask questions</p>
</li>
<li>
<p><em>Guideline 3:</em> Make your language consistent</p>
</li>
</ul>
</div>
</li>
<li>
<p>Language Realization</p>
<div class="ulist">
<ul>
<li>
<p><em>Guideline 4:</em> Decide carefully whether to use graphical or textual realization</p>
</li>
<li>
<p><em>Guideline 5:</em> Compose existing languages where possible</p>
</li>
<li>
<p><em>Guideline 6:</em> Reuse existing language definitions</p>
</li>
<li>
<p><em>Guideline 7:</em> Reuse existing type systems</p>
</li>
</ul>
</div>
</li>
<li>
<p>Language Content</p>
<div class="ulist">
<ul>
<li>
<p><em>Guideline 8:</em> Reflect only the necessary domain concepts</p>
</li>
<li>
<p><em>Guideline 9:</em> Keep it simple</p>
</li>
<li>
<p><em>Guideline 10:</em> Avoid unnecessary generality</p>
</li>
<li>
<p><em>Guideline 11:</em> Limit the number of language elements</p>
</li>
<li>
<p><em>Guideline 12:</em> Avoid conceptual redundancy</p>
</li>
<li>
<p><em>Guideline 13:</em> Avoid inefficient language elements</p>
</li>
</ul>
</div>
</li>
<li>
<p>Concrete Syntax</p>
<div class="ulist">
<ul>
<li>
<p><em>Guideline 14:</em> Adopt existing notations domain experts use</p>
</li>
<li>
<p><em>Guideline 15:</em> Use descriptive notations</p>
</li>
<li>
<p><em>Guideline 16:</em> Make elements distinguishable</p>
</li>
<li>
<p><em>Guideline 17:</em> Use syntactic sugar appropriately</p>
</li>
<li>
<p><em>Guideline 18:</em> Permit comments</p>
</li>
<li>
<p><em>Guideline 19:</em> Provide organizational structures for models</p>
</li>
<li>
<p><em>Guideline 20:</em> Balance compactness and comprehensibility</p>
</li>
<li>
<p><em>Guideline 21:</em> Use the same style everywhere</p>
</li>
<li>
<p><em>Guideline 22:</em> Identify usage conventions</p>
</li>
</ul>
</div>
</li>
<li>
<p>Abstract Syntax</p>
<div class="ulist">
<ul>
<li>
<p><em>Guideline 23:</em> Align abstract and concrete syntax</p>
</li>
<li>
<p><em>Guideline 24:</em> Prefer layout which does not affect translation from concrete to abstract syntax</p>
</li>
<li>
<p><em>Guideline 25:</em> Enable modularity</p>
</li>
<li>
<p><em>Guideline 26:</em> Introduce interfaces</p>
</li>
</ul>
</div>
</li>
</ul>
</div>
<div class="paragraph">
<p>As can be seen from the list, it considers it from a very abstract level and thus really contributes to the field. It applies generically if you want to design a file format. But it also applies in a specific context; for example if you want to redesign how mathematicians write their formulas. It is the only set of guidelines which also talks about notations.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="talk-38c3-fearsome-file-formats">Talk “38C3 - Fearsome File Formats”</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Ange Albertini <a href="#talk38c3">[talk38c3]</a> has built up a lot of expertise on file formats over the last decade and gives an overview in this talk about how file formats can be abused and repurposed. In this talk, I want to focus only on one slide in particular which lists recommendations how to design a “good file format”:</p>
</div>
<div class="imageblock center">
<div class="content">
<img src="/assets/img2025-08-05_ange-albertini_ten-commandments.png" alt="“Commandments of a good file format”" width="80%">
</div>
</div>
<div class="paragraph">
<p>In text, the ten commandments are:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Magic at offset zero (fast identification, no bypass)</p>
</li>
<li>
<p>Clear chunk structure (forward compatibility, easy parsing/cleanup)</p>
</li>
<li>
<p>Version number (forward thinking)</p>
</li>
<li>
<p>No duplicity (duplicitly → discrepency)</p>
</li>
<li>
<p>No “constant” variables (ossification → hardcoding)</p>
</li>
<li>
<p>Up-to-date specs (reflect reality)</p>
</li>
<li>
<p>Samples set (Theory isn&#8217;t enough)</p>
</li>
<li>
<p>Extensibility (your format will evolve in unknown ways)</p>
</li>
<li>
<p>Keep the spirit (don&#8217;t reuse formats for different intent without trivial distinction)</p>
</li>
<li>
<p>Perfect is the enemy of good (shortcuts will be taken to avoid over-complexity)</p>
</li>
</ol>
</div>
</div>
</div>
<div class="sect1">
<h2 id="article-on-file-formats">Article “On File Formats”</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Rather recently, Jari Komppa wrote an article <a href="#articleKOMPPA">[articleKOMPPA]</a> on the design of file formats (<a href="https://news.ycombinator.com/item?id=44049252">HackerNews discussion</a>). He lists the following recommendations:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Does a file format exist for this yet?</p>
</li>
<li>
<p>Does it need to be human readable?</p>
</li>
<li>
<p>Chunk your binaries.</p>
</li>
<li>
<p>Allow partial parsing.</p>
</li>
<li>
<p>Version your formats.</p>
</li>
<li>
<p>Document your format.</p>
</li>
<li>
<p>Don&#8217;t include fields just in case.</p>
</li>
<li>
<p>Consider the target hardware.</p>
</li>
<li>
<p>Compression.</p>
</li>
<li>
<p>On filename extensions (i.e. consider four letters, but three letters are mostly allocated).</p>
</li>
</ol>
</div>
</div>
</div>
<div class="sect1">
<h2 id="paper-little-languages">Paper “Little languages”</h2>
<div class="sectionbody">
<div class="paragraph">
<p>The paper by Jon Bentley <a href="#paperLITTLE">[paperLITTLE]</a> is by far the oldest. I do believe the notion of domain-specific languages was not sufficiently developed at that time and he coined the notion of <em>little languages</em>. The most prominent example from the paper is <a href="https://en.wikipedia.org/wiki/AWK">awk</a> as little language. The idea is that domain-specific languages shall be developed and tools like awk help for the first step. He recites “an old rule of thumb” that “the first 10% of programming effort provide 90% of the functionality”. Only if the language evolves sufficiently, developers should consider using parsing tools like lex and yacc. Regarding the design of such languages, the paper lists the following recommendations:</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1">Orthogonality</dt>
<dd>
<p>keep unrelated features unrelated.</p>
</dd>
<dt class="hdlist1">Generality</dt>
<dd>
<p>use an operation for many purposes.</p>
</dd>
<dt class="hdlist1">Parsimony</dt>
<dd>
<p>delete unneeded operations.</p>
</dd>
<dt class="hdlist1">Completeness</dt>
<dd>
<p>can the language describe all objects of interest?</p>
</dd>
<dt class="hdlist1">Similarity</dt>
<dd>
<p>make the language as suggestive as possible.</p>
</dd>
<dt class="hdlist1">Extensibility</dt>
<dd>
<p>make sure the language can grow.</p>
</dd>
<dt class="hdlist1">Openness</dt>
<dd>
<p>let the user ”escape” to use related tools.</p>
</dd>
</dl>
</div>
</div>
</div>
<div class="sect1">
<h2 id="discussion">Discussion</h2>
<div class="sectionbody">
<div class="paragraph">
<p>I want to start this discussion with a definition. Technically, there is no notion of a “binary file” and a “text file”. In practice, the distinction helps because we can immediately align our expectation whether we need a text editor (text file) or a specialized software (binary file) to handle the file. What is the distinction?</p>
</div>
<div class="sect2">
<h3 id="text-files-versus-binary-files">Text files versus binary files</h3>
<div class="paragraph">
<p>Fundamentally, text means that some serialization format or character set exists to encode text. The famous <a href="https://en.wikipedia.org/wiki/ASCII">ASCII</a> (American Standard Code for Information Interchange) standard covers 128 characters and other sets are not considered ASCII these days. The other famous text encoding is <a href="https://en.wikipedia.org/wiki/Unicode">Unicode</a> which has taken up the tremendous effort to cover all used writing systems of the world in one encoding. To serialize Unicode scalars into actual bytes, different character sets can be chosen. <a href="https://en.wikipedia.org/wiki/UTF-16">UCS-2 and UTF-16</a> are deprecated, but <a href="https://en.wikipedia.org/wiki/UTF-32">UTF-32</a> can be still found. Unlike UTF-32, <a href="https://en.wikipedia.org/wiki/UTF-8">UTF-8</a> has the advantage of backwards-compatibility to ASCII and even <a href="https://en.wikipedia.org/wiki/ISO/IEC_8859-1">ISO 8859-1</a>. Furthermore it uses less space to serialize text in common languages like English. Since each character uses one up to 4 four bytes, the same text encoded in UTF-32 usually longer, but cannot be indexed by codepoint in constant time. To summarize: there is a long history of character sets including <a href="https://en.wikipedia.org/wiki/EBCDIC">EBCDIC</a>, <a href="https://en.wikipedia.org/wiki/Mac_OS_Roman">Mac OS Roman</a>, <a href="https://en.wikipedia.org/wiki/Windows-1252">Windows-1252</a>, <a href="https://en.wikipedia.org/wiki/Shift_JIS">Shift-JIS</a>, <a href="https://en.wikipedia.org/wiki/Cork_encoding">Cork</a>, and <a href="https://en.wikipedia.org/wiki/UTF-8#Surrogates">WTF-8</a>. But with a current adoption above 98% within the World Wide Web, UTF-8 is the de-facto standard (backed by manifestos like <a href="https://utf8everywhere.org/">UTF-8 Everywhere</a>) and commonly picked as character set for new text file formats. And someone how thinks Unicode is unnecessary for English text, must be considered a bit naïve.</p>
</div>
<div class="paragraph">
<p>Going back to the question of binary versus text, the result is that some file formats declare that only byte sequences according to the declared character set are considered admissible files; namely text files. This is put into contrast to binary files where any sequence of bytes is admissible per default. Recognize that file formats like <a href="https://html.spec.whatwg.org/multipage/semantics.html#charset:content-type-2">HTML require you to declare the character set</a>, formats like <a href="https://en.wikipedia.org/wiki/PDF#Text">PDF allow you to switch character encoding in the file as often as desired</a>, and formats like <a href="https://www.rfc-editor.org/rfc/rfc8187">HTTP headers are so complex that custom RFCs were written</a>. One counterexample is the <a href="https://toml.io/en/">TOML file format</a> which declares that “a TOML file must be a valid UTF-8 encoded Unicode document” and therefore is a genuine text file format.</p>
</div>
</div>
<div class="sect2">
<h3 id="what-a-text-encoding-contributes">What a text encoding contributes</h3>
<div class="paragraph">
<p>One notorious problem with file format definitions is that people think that terms like “whitespace”, “hyphen”, or “line break” are universal, unambiguous names for characters. No, no, and no. Instead the notion of <a href="https://en.wikipedia.org/w/index.php?title=Unicode_character_property&amp;oldid=1295132189">whitespace</a> (more specifically <em>Unicode scalars with Whitespace property</em>), hyphen (more specifically <em>U+002D - HYPHEN-MINUS</em>), and line break (more specifically <em>Mandatory break according to <a href="https://www.unicode.org/reports/tr14/">UAX#14</a></em>) specifically come from text encodings like Unicode.</p>
</div>
<div class="paragraph">
<p>If you don&#8217;t specify the text encoding, I don&#8217;t know what those words mean. For Unicode encodings like UTF-8 or UTF-16, I clarified the meaning. If you use ASCII instead of Unicode, everyone understands that “whitespace” means 0x20, the space character as only representative of this group. If you mention hyphen, it is even more unambiguous than the Unicode case, because the less common <a href="https://en.wikipedia.org/w/index.php?title=Soft_hyphen&amp;oldid=1305884138">U+00AD SOFT HYPHEN</a> exists (among others). In the case of ASCII, “hyphen” means 0x2D unambiguously. But in ASCII there is no line break definition at all. Is 0x0A (“line feed”) a line break? Is 0x0D (“carriage return”) a line break? Both? Conventionally, 0x0A is a line break on Linux machines and the sequence 0x0A &amp; 0x0D is a line break on Windows machines. But why is 0x0C (“page break”) not a line break? If you define a page break, you necessarily contribute a line break?! We are never going to know this, because ASCII does not define what a line break is.</p>
</div>
<div class="paragraph">
<p>If you actually decide to use a well-thought through standard like Unicode, you can answer difficult questions quickly in a standardized way as well: Is <a href="https://www.unicode.org/reports/tr15/">Unicode normalization</a> semantically meaningful?</p>
</div>
<div class="paragraph">
<p>If you don&#8217;t specify the text encoding, I literally know nothing about the content. I don&#8217;t know what a whitespace character is. I don&#8217;t even know what a character is, because I don&#8217;t know how many bytes constitute a character.</p>
</div>
</div>
<div class="sect2">
<h3 id="regarding-syntax-escaping">Regarding syntax escaping</h3>
<div class="paragraph">
<p>Some binary file formats and most text file formats have some requirement like “arbitrary user content follows”. In this setting, you really don&#8217;t know when the user content is finished. As a result, you are going to need some byte sequence which tells “user content finishes here” which is not interpreted as user content itself. You need to escape the “user content syntax”.</p>
</div>
<div class="paragraph">
<p>I wrote <a href="/articles/2022-07-17-concept-of-syntax-escaping">an article about syntax escaping some time ago</a>, but the gist is this: syntax escaping can be avoided by a length specifier which is only practical for binary files (never let a user count bytes or Unicode codepoints). Otherwise, you can decide to declare one byte sequence to be “escaping”. If you repeat this byte sequence, it regains its original meaning, but otherwise some escaping sequence is started which might signify something like “user content stops here”.</p>
</div>
<div class="paragraph">
<p>The worst thing, you can do is to ignore the problem. If you allow arbitrary user content, but don&#8217;t declare an escaping mechanism, you either open up yourself to ambiguities or violate the requirement “<strong>arbitrary</strong> user content”. My personal opinion is that XML&#8217;s escaping mechanism is simple and extensible compared to other approaches.</p>
</div>
</div>
<div class="sect2">
<h3 id="regarding-file-extensions">Regarding file extensions</h3>
<div class="paragraph">
<p>File extensions give the operating system a clue which application might be capable of interpreting a file. For historic reasons, they tend to be short (2 to 4) sequences of Latin characters. They are an incomplete concept leading to unintended collisions. For example all kinds of markup syntaxes are declared as <code>.md</code> file these days. Historically <code>.txt</code> used to be full of collisions. But it still makes sense to align all users upon one file extension.</p>
</div>
<div class="paragraph">
<p>What I would like to stress here as well is the MIME type. It is equally helpful to align all users upon one MIME type. The <code>x-</code> prefix opens up MIME types to custom standards. So <code>text/x-foobar</code> would be a valid choice.</p>
</div>
</div>
<div class="sect2">
<h3 id="regarding-magic-numbers">Regarding magic numbers</h3>
<div class="paragraph">
<p>One might think that magic numbers are unnecessary boilerplate. If the specified structure is unique for your file format anyhow, why should a magic number be necessary? The answer is simple: Not all tools want to look at the entire document structure to determine whether a file follows a certain file format. If it only has to read some leading bytes (namely the so-called <em>magic number</em>), they are much quicker to determine whether the file is interesting <sup class="footnote">[<a id="_footnoteref_1" class="footnote" href="#_footnotedef_1" title="View footnote.">1</a>]</sup></p>
</div>
</div>
<div class="sect2">
<h3 id="regarding-version-numbers">Regarding version numbers</h3>
<div class="paragraph">
<p>The simple argument pro version numbers is “implementors can easily dispatch interpretation”. If your document follows the specification 1.0, the source code for 1.0 interprets your file. If your document follows the specification 2.0, the source code for 2.0 interprets your file. This way you can easily introduce backwards-incompatible version changes.</p>
</div>
</div>
<div class="sect2">
<h3 id="design-requires-re-re-re-iteration">Design requires re-re-re-iteration</h3>
<div class="paragraph">
<p>File format design is design. Every design needs iteration for perfection and consistency. Please finish the draft version in a straight-forwards, usecase-centered manner. But be open to improve upon your design in subsequent versions. Iterate and iterate and iterate. And re-iterate again. And ask your target audience about their opinion. Then you mastered the art.</p>
</div>
</div>
<div class="sect2">
<h3 id="about-the-general-approach-to-design">About the general approach to design</h3>
<div class="paragraph">
<p>One thing, I would like to point out which comes from programming language design is that design should go from specific cases to generality. What is meant that one can specify very extensible elements in your syntax. But you should define those elements for their specific cases and disallow others. In subsequent versions, you might understand which other cases exist and which cases make sense. Under these circumstances, you might open up that element for more (or more general) cases.</p>
</div>
<div class="paragraph">
<p>Let me illustrate this with a trivial example: You might have 10 specific cases to distinguish, but you have to use one byte as discriminant. Therefore 256 cases can be distinguished, but you only need 10 cases. Now the general approach to design can be done in the following <em>wrong</em> way:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Specify the 10 cases</p>
</li>
<li>
<p>Declare 246 cases to be “implementor-defined”</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Instead the following <em>correct</em> way can be taken:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Specify the 10 cases</p>
</li>
<li>
<p>Disallow 246 cases</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The point of the latter approach is not that “implementor-defined is always stupid”. If you are certain please specify (for example) 10 cases for the desired cases, 20 cases for “implementor-defined” usecases, but be aware that disallowing some values opens up extensibility in future versions. You should restrict your design tightly. Once you gained experience and feedback, you should open up to other cases. Being restricted in the beginning enables the necessary extensibility for later.</p>
</div>
</div>
<div class="sect2">
<h3 id="a-generic-approach-for-defining-binary-file-formats">A generic approach for defining binary file formats</h3>
<div class="paragraph">
<p>There is a simple design which can model any binary data model unambiguously. It is called TLV (<a href="https://en.m.wikipedia.org/wiki/Type%E2%80%93length%E2%80%93value">Type-Length-Value</a>).</p>
</div>
<div class="paragraph">
<p>The file format has to follow the general recommendations first. Introduce a magic number. Introduce a version number. Put your metadata in a header. And then let us write down the data in the body of the file.</p>
</div>
<div class="paragraph">
<p>The body is a sequence of entries. Every entry consists of a type, a length, and a value.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>A type is a discriminator (thus, of fixed width) telling what kind of data you are supposed to expect in the value. For example, byte <code>0x13</code> might identify value to an unsigned integer in big endian. <code>0x13</code> is one instance of this type.</p>
</li>
<li>
<p>A length specifies how many bytes the value constitutes of. It needs to be of fixed width as well and common choices include 16 or 32 bits. For example, bytes <code>0x00 0x04</code> might identify length 4 and thus our example value is expected to be an unsigned integer in big endian of 4 bytes.</p>
</li>
<li>
<p>The value is a sequence of bytes. Because of the previous values, you know exactly how many bytes you are supposed to read to understand the value. Furthermore, we attached some semantics through the type discriminator. The specification is now supposed to specify how to interpret those bytes of this type.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>You have to encode a boolean value? Designate a type, length one should always be required, and specify which two values are admissible. Done.<br>
You have to encode Unicode text? Designate a type, the length can be adjusted to the actual byte length, and specify which encoding you require in the value. Done.</p>
</div>
<div class="paragraph">
<p>This design is very simple and very generic. Does it solve all problems and do all binary files become instances of the TLV design? No. TLV is generic and a very good guideline. However, it applies only to binary files (no human wants to think through a three-step process all the time and particularly count bytes) and binary files mainly exist because they optimize some requirements over text files. Binary files might optimize parsing performance or space usage. As a result, designers start to skip fields. For example, if an entry of type <code>0x42</code> is always preceeded by an entry of type <code>0x41</code>, space-optimizing designers might claim that the type byte <code>0x42</code> must be left out. Indeed, position-defined entries do not need a type if it can be derived from the position index. But in this very moment, the TLV design principle is violated and TLV remains only as “general rule of thumb”.</p>
</div>
<div class="paragraph">
<p>Everyone should be familiar with TLV and follow it for binary files, if ambiguity-freedom cannot be guaranteed for the entire design. A similar design where values usually represent <em>chunks</em> is the <a href="https://en.wikipedia.org/wiki/Interchange_File_Format">Interchange File Format</a>.</p>
</div>
</div>
<div class="sect2">
<h3 id="target-hardware-and-endianness">Target hardware and endianness</h3>
<div class="paragraph">
<p>When it comes to binary files, endianness is necessary to be specified. Usually deciding upon this value directly leads to the question of target hardware platforms. Endianness can be defined arbitrarily. Big endian or little endian? It is trivial. Just pick one. But the only <em>meaningful</em> way to pick the best value is thinking about the target platform. Does it target Intel machines? Then little endian makes more sense. Are humans going to look at the values from time to time? Big endian might be more convenient, but less optimized for desktop computer hardware.</p>
</div>
<div class="paragraph">
<p>You should know your target domain, your target audience, and common hardware platforms. But don&#8217;t optimize prematurely.</p>
</div>
<div class="paragraph">
<p>If you start cramming all data into <a href="https://en.wikipedia.org/wiki/Bit_array">bitvectors</a> to save a few bytes to optimize space, you neglect that machines are optimized to operate on bytes and cache lines. Extracting individual bits is a time-consuming operation. You might be better off adding a few unused bits exchanging space for time.</p>
</div>
<div class="paragraph">
<p>In the end, benchmarking your prototype parsing implementation reveals the actually interesting parts to optimize. This becomes especially important, if you plan to apply compression on parts of your data.</p>
</div>
</div>
<div class="sect2">
<h3 id="syntax-and-semantics">Syntax and semantics</h3>
<div class="paragraph">
<p>My final point would be that syntax and semantics are two different concepts. You need to be aware of it. You may be able to use the syntax of an existing standard and define custom semantics on top of it. One example would be <a href="https://www.w3.org/TR/2006/REC-xml11-20060816/">XML</a> (syntax) and <a href="https://www.w3.org/2002/mmi/ink">InkML</a> (semantics). You may also define a new syntax like <a href="https://en.wikipedia.org/w/index.php?title=Simple_Outline_XML&amp;oldid=1280830136">Simple Outline XML</a> for existing semantics like XML.</p>
</div>
<div class="paragraph">
<p>I gave examples for text files here, but this also applies to binary files. The only problem is that binary files usually have a very specific data model which differs to other formats. But generic binary file standards include <a href="https://en.wikipedia.org/w/index.php?title=ASN.1&amp;oldid=1305031387">ASN.1</a>, <a href="https://github.com/edn-format/edn">edn</a>, and postcard (<a href="https://www.youtube.com/watch?v=HtBFvTH5ZKE">introductory talk on youtube</a>).</p>
</div>
<div class="paragraph">
<p>If you are able to split syntax and semantics, you end up in one of two scenarios:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>you can use an established, proven-in-practice standard for one of two components. The necessary tools do not need to be written again.</p>
</li>
<li>
<p>you get the possibility to remove one component and exchange it for something else, if you recognize a mistake. Programming languages like <a href="https://en.wikipedia.org/wiki/Dylan_(programming_language)#Syntax">Dylan</a> just removed their LISP-style syntax and introduced something new.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Not too bad, right?</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="decision-list">Decision list</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Finally, I want to contribute a decision list where items to consider are listed:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Are you sure the effort of defining a new file format is worth it? If no, stop. If yes, proceed.</p>
</li>
<li>
<p>Split syntax and semantics. Can you reuse syntax (e.g. XML, S-expressions, JSON, YAML, …) or semantics (eg. JSON, YAML, ASN.1, postcard, …) of existing ones?</p>
</li>
<li>
<p>The following holds true for text and binary files:</p>
<div class="ulist">
<ul>
<li>
<p>No duplicity (duplicitly → discrepency)</p>
</li>
<li>
<p>Add a version number. No exceptions. Consider <a href="https://semver.org/">versioning schemes</a> to communicate users which expectations regarding compatibility / upgrade necessity are given.</p>
</li>
<li>
<p>Ask for feedback regarding syntax (text files) and data model (text files and binary files)</p>
</li>
<li>
<p>Avoid unnecessary generality. It is easier to permit features later on than standardizing elements later when they are in use already</p>
</li>
<li>
<p>Ask a domain expert for feedback and iterate.</p>
</li>
<li>
<p>Increment the version number and release.</p>
</li>
<li>
<p>Which interfaces to embed content for other file formats do you provide?</p>
</li>
<li>
<p>Revise which elements provide modularity and extensibility in your file format.</p>
</li>
</ul>
</div>
</li>
<li>
<p>Is it going to be a text file?</p>
<div class="ulist">
<ul>
<li>
<p>Declare the character set (UTF-8 is recommended)</p>
</li>
<li>
<p>Unicode is utilized by the U-notation: <code>U+002C COMMA</code>. Other character sets commonly use plain hexadecimal notation like <code>0x2C</code>. If you refer to a character, get used to this notation and use it exclusively.</p>
</li>
<li>
<p>Depending on your character set, you may use words like ‘whitespace’ now to describe your file format</p>
</li>
<li>
<p>Do you want to base your syntax on existing concepts and notation? Remember that any existing technology exists, because it has some valuable benefits. But you are designing something new, because it does satisfy your requirements. Comprehend the benefits, integrate them into your design, and reiterate multiple times to achieve consistency.</p>
</li>
<li>
<p>Define the syntax escaping mechanism if arbitrary user content is allowed</p>
</li>
<li>
<p>Discuss punctuation versus keywords. Punctuation is a small set of characters and thus brief. But only programmers are used to use them in various contexts. Keywords are longer and extensible, but you have to discuss questions like casing and singular versus plural (depending on the writing system and language).</p>
</li>
<li>
<p>Discuss whether you want to include comments (elements which carry no semantics, but provide an opportunity for documentation to the author)</p>
</li>
<li>
<p>Discuss whether you want to allow trailing separators (e.g. <code>["item1", "item2",]</code> if comma is your separator)</p>
</li>
</ul>
</div>
</li>
<li>
<p>Is it going to be a binary file?</p>
<div class="ulist">
<ul>
<li>
<p>Add a magic number at offset zero</p>
</li>
<li>
<p>Declare: are multi-byte values encoded in little endian or big endian?</p>
</li>
<li>
<p>Declare and illustrate the big picture structure of your file format (e.g. header/body/footer). It is easy to get lost in details (or <em>bore the hell out of the reader</em>) when describing binary file formats.</p>
</li>
<li>
<p>Enable parsers to skip structural parts of your file format (e.g. the entire body, because its length is declared)</p>
</li>
<li>
<p>Follow the TLV design. Declare the semantics of values in the file.</p>
</li>
<li>
<p>If you don&#8217;t follow the TLV design, define the escaping mechanism (length declaration is recommended)</p>
</li>
</ul>
</div>
</li>
<li>
<p>Publication:</p>
<div class="ulist">
<ul>
<li>
<p>Provide example files.</p>
</li>
<li>
<p>Provide a specification document. Specify where people can direct their feedback to. Specify the version this document documents.</p>
</li>
<li>
<p>Develop tools to read, write, analyze, and fix files in this format.</p>
</li>
<li>
<p>Suggest a file extension for files. Suggest a MIME type for files.</p>
</li>
</ul>
</div>
</li>
</ul>
</div>
</div>
</div>
<div class="sect1">
<h2 id="conclusion">Conclusion</h2>
<div class="sectionbody">
<div class="paragraph">
<p>File format definition is a difficult art. And hopefully I summarized some guidelines for you. If you succeed, people are going to enjoy writing parsers for it. If you fail, your file format is going to suffer from fragmentation and limited adoption. Good luck!</p>
</div>
</div>
</div>
<div id="footnotes">
<hr>
<div class="footnote" id="_footnotedef_1">
<a href="#_footnoteref_1">1</a>. For UNIX users, a simple scenario is <code>grep</code> which has to decide whether a file is binary (to be ignored) or text (to be searched in).
</div>
</div>]]></content><author><name>Lukas Prokop</name></author><category term="cs" /><category term="software-development" /><summary type="html"><![CDATA[Update 2025-10-19: I added edn as mentioned file format.]]></summary></entry><entry xml:lang="eo"><title type="html">Revjuo: Aŭstria Esperanto-Kongreso 2025</title><link href="https://lukas-prokop.at/articles/2025-05-18-revjuo-a%C5%ADstria-esperanto-kongreso" rel="alternate" type="text/html" title="Revjuo: Aŭstria Esperanto-Kongreso 2025" /><published>2025-05-18T10:00:00+02:00</published><updated>2025-05-25T12:06:57+02:00</updated><id>https://lukas-prokop.at/articles/2025-05-18-revjuo-a%C5%ADstria-esperanto-kongreso</id><content type="html" xml:base="https://lukas-prokop.at/articles/2025-05-18-revjuo-a%C5%ADstria-esperanto-kongreso"><![CDATA[<div class="sect1">
<h2 id="enkonduko">Enkonduko</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Ekde la 2-a ĝis la 4-a de Maio 2025, la <a href="https://aek2025.esperanto-graz.at/">unua Aŭstria Esperanto Kongreso</a> okazis ĉe Graz. Graz estas la dua plej granda urbo de Aŭstrio kaj mia eksloĝurbo. Lastatempe Ewald iĝis estro de la Aŭstria Esperanto klub (post sia tempo kiel Stiria estro) kaj li revis pri ĉi tiu evento. Afablaj Esperantistoj de pluraj landoj helpis realigi ĝin. Persone, mi ne kontribuis ĉar mi translokiĝis dum la organiztempo kaj mi nur prelegis.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="evento">Evento</h2>
<div class="sectionbody">
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-09_progresanta-kongreso.jpg"><img src="../assets/img/2025-05-09_progresanta-kongreso.jpg" alt="La bildo montras grandan ĉambron kun diversaj Esperantaj flagoj kaj aŭskultaro" width="100%"></a>
</div>
</div>
<div class="paragraph">
<p>Dum tri tagoj, pluraj prelegoj okazis. Inter la prelegoj estis sufiĉe da tempo ĉar entute nur 14 prelegoj okazis. Nan ankaŭ kontribuis du sesionojn kun ekzercado de ĉigongo. Vendrede kaj sabate vespere okazis muzikeventon. Ĝenerale la prelegoj temis pri kulturoj kaj lingvoj. La kongresejo estis la partidomon de la urbestrino.</p>
</div>
<div class="paragraph">
<p>Mi mencios nur kelkajn prelegojn:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>En “Sopiro je Senco”, Norina kaj Philipp klarigis kiel ili retrovas logoterapion en la Esperanto komunumo.</p>
</li>
<li>
<p>En “Unuiĝintaj Nacioj, kiel la reprezentantoj de UEA laboras tie”, Mireille raportis ŝian sperton kun reprezento de Esperanto en la UN.</p>
</li>
<li>
<p>La prelego “Gravaj sciencistoj de Graz kaj Stirio” estis mojoza ĉar mi certas ke ĉiu aŭskultinto lernis ion pri sciencistoj de Graz. Ŝi prezentis biografion de sciencistoj kaj iliaj gravaj rezultoj.</p>
</li>
<li>
<p>Mi lernis ke la preleganto Viŝnja nur lernis Esperanton por unu jaro nun kaj ŝi jam perfekte prezentis la temon “La vera genio: Spomenka Štimec”. Estis tre interesa prelego ĉar mi lernis pri la literaturo de multaj aŭtoroj kaj ŝi ankaŭ menciis la rilaton inter Esperanto monumento ĉe Graz kaj Zagreb. Mi tute ne sciis; mi pardonpetas.</p>
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-10_monumento-ĉe-graz-kaj-zagreb.jpg"><img src="../assets/img/2025-05-10_monumento-ĉe-graz-kaj-zagreb.jpg" alt="La bildo montras lumbildon de du monumentoj trovaĝante en Zagreb kaj Graz" width="100%"></a>
</div>
</div>
</li>
<li>
<p>En la prelego “Esperanto kaj Rumantsch Grischun”, mi lernis pri la Rumantsch Grischun lingvo parolante en Svislando. Mi tute ne sciis pri la lingvo.</p>
</li>
<li>
<p>Uli komparis diverslingvajn tradukojn de “La eta princo” de Antoine de Saint-Exupéry Statistika en sia prelego. Mi ne diras ke ĉiu statistiko estis interesa (ofte nur estis rezulto de la skribsistemo) sed li bone klopodis kolekti statistikajn rezultojn kaj plaĉis al mi.</p>
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-11_ulrich-parolas-pri-tradukoj-de-eta-princo.jpg"><img src="../assets/img/2025-05-11_ulrich-parolas-pri-tradukoj-de-eta-princo.jpg" alt="Ulrich dekstre prezentas la lumbildon maldekstre kiu montras statistikon pri la nombro de signojn per traduko" width="100%"></a>
</div>
</div>
</li>
<li>
<p>Amuze en la tria prelego, Mireille prezentis fotojn de diversaj landoj montre banoj, duŝoj, vestolavadoj, kaj necesejoj.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Mia <a href="https://lukas-prokop.at/talks/2025-05-10_alireblo/lumbildoj.pdf">propra prelego</a> funkciis akceptable. Mi prelegis pri alireblo en cifereca kompostado. Mi volis paroli pri etiketataj PDF dosieroj, sed mi eksciis ke la temo estus tro teknika. Do mi plilongigis la enkondukon pri ciferecaj programoj. Antaŭ la prelego mi komprenis ke la kvar programoj ne taŭgas por 30 minutoj. Mi forigis miajn lumbildoj de du programoj kaj ekprezentis. Mi tre ŝatis miajn finajn lumbildojn sed ankoraŭ trovis akuzativan eraron dum la prelego. Finfine mi perfekte trafis la 30 minutojn sed mia parolo estis malbone. Mi konis la vortojn sed mi ne sufiĉe flue parolis pri la temo. Klopodu, Luko!</p>
</div>
<div class="paragraph">
<p>Mi ne parolis pri la diversaj interagoj de homoj. Mi parolis kun diversaj afablaj homoj. Tre plaĉis al mi ke diversaj Esperantistoj organizis libroservon, muzikon, sesion kun korpa ekzercado, kaj helpis realigi la eventon.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="konkludo">Konkludo</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Estis malgranda evento de proksime 70 partoprenintoj. La organizo ne estis tiel bone, sed bone taŭgis por la partoprenaro. Dankon por la kontribuoj, Esperantistoj!</p>
</div>
</div>
</div>]]></content><author><name>Lukas Prokop</name></author><category term="languages" /><category term="Esperanto" /><category term="reflection" /><summary type="html"><![CDATA[Enkonduko]]></summary></entry><entry xml:lang="en"><title type="html">Review: RustWeek 2025</title><link href="https://lukas-prokop.at/articles/2025-05-17-rustweek2025-review" rel="alternate" type="text/html" title="Review: RustWeek 2025" /><published>2025-05-17T09:00:00+02:00</published><updated>2025-05-25T12:06:15+02:00</updated><id>https://lukas-prokop.at/articles/2025-05-17-rustweek2025-review</id><content type="html" xml:base="https://lukas-prokop.at/articles/2025-05-17-rustweek2025-review"><![CDATA[<div class="sect1">
<h2 id="motivation">Motivation</h2>
<div class="sectionbody">
<div class="paragraph">
<p>My company was generous and sent me to <a href="https://2025.rustweek.org/">RustWeek 2025</a> organized by RustNL in Utrecht, Netherlands. RustWeek is a conference for rust developers and especially many core contributors can be found there. So this was a nice opportunity to get some educational input for the programming language. I visited the conference together with a work colleague.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="travelling">Travelling</h2>
<div class="sectionbody">
<div class="paragraph">
<p>We left Vienna by night train towards Amersfoort on Sunday. Arriving late on Monday, we went to our hotel. Unpacking our stuff, we got our conference badges, headed for lunch, and started our day of work very late. Getting the first impressions of Utrecht, I concluded this is a very nice place for me. The people are nice and expectedly I love the bicycling infrastructure. The canal Merwedekanaal benoorden de Lek gave me sea-style vibes (sure, as Austrian my understanding of seas is very limited).</p>
</div>
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-12_netherlands-windmill.jpg"><img src="../assets/img/2025-05-12_netherlands-windmill.jpg" alt="The picture shows a scene from the Netherlands in the afternoon with a windmill in the center" width="100%"></a>
</div>
<div class="title">Figure 1. The Netherlands are famous for their windmills</div>
</div>
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-12_netherlands-pavement.jpg"><img src="../assets/img/2025-05-12_netherlands-pavement.jpg" alt="A street in Utrecht where pedestrians can walk on the left and right side of the street on a pavement and cars are parked next it it. The center allows cars to pass through the street and many trees make the street very green" width="100%"></a>
</div>
<div class="title">Figure 2. A common street situation in Utrecht (pavement made of bricks and many green trees)</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="conference-venue">Conference venue</h2>
<div class="sectionbody">
<div class="paragraph">
<p>The conference took place at Kinepolis Jaarbeurs (cinema of the conference center). <a href="https://hachyderm.io/@Mara/114502378870262985">Mara</a> turned this into a wonderful experience by subtly replacing the cinema promotional posters by rust-tailored ones. Splendid!</p>
</div>
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-13_cinema-posters.jpg"><img src="../assets/img/2025-05-13_cinema-posters.jpg" alt="Several rust-themed cinema posters referring to well-known movies like Indiana Jones but calling it &quot;Raiders of the Lost Arc&lt;_&gt;&quot; instead" width="100%"></a>
</div>
<div class="title">Figure 3. Subtile rust cinema posters</div>
</div>
<div class="paragraph">
<p>Even though the conference essentially took place the entire week, we only had tickets for the two conference days with talks.</p>
</div>
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-13_rustweek2025-my-badge"><img src="../assets/img/2025-05-13_rustweek2025-my-badge.jpg" alt="My badge showing my name with a cinema room in the blurred background" width="50%"></a>
</div>
<div class="title">Figure 4. My conference badge</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="conference">Conference</h2>
<div class="sectionbody">
<div class="paragraph">
<p>On 2025-05-13, the talks started. The intro was very popular. It showed a giant ferris orbitting the earth with the final title “RustWeek” in the style of the Universal Studios intro. They mentioned it took them three days to create it including one day for rendering. Its duration is less than 30 seconds, if I remember correctly.</p>
</div>
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-13_rustweek2025-ferris-intro"><img src="../assets/img/2025-05-13_rustweek2025-ferris-intro.jpg" alt="A photo of the RustWeek intro showing a giant ferris in front of Earth" width="100%"></a>
</div>
<div class="title">Figure 5. Rendered intro</div>
</div>
<div class="paragraph">
<p>There were some very interesting talks for me and I am not finished with watching all <a href="https://www.youtube.com/@rustnederlandrustnl">recordings</a> yet. But I am going to lose some words about the following talks:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="https://rustweek.org/talks/alex">Alex Crichton on the question “10 Years of Rust: Why?”</a> gave a really suitable keynote for the event. He gave a rationale for the steps taken and tried to analyze what was necessary to reach 10 years of developing the language.</p>
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-13_rustweek2025-alex-rust"><img src="../assets/img/2025-05-13_rustweek2025-alex-rust-10y.jpg" alt="Alex Crichton is seen on the right presenting his keynote inside the cinema and the slides on the left show the number of his git commits over time which decreased since he focused on wasmtime contributions" width="50%"></a>
</div>
</div>
</li>
<li>
<p>Due to my personal interest in digital typesetting, I am very grateful to Raph Levien and his work. <a href="https://rustweek.org/talks/raph/">Raph spoke about “Faster, easier 2D vector rendering”</a> and spoke about one algorithm to improve rendering of fonts. He tried to reason about the use of GPUs for font rendering and explained which rust crates come into play.</p>
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-13_rustweek2025-raph-glyphs"><img src="../assets/img/2025-05-13_rustweek2025-raph-glyphs.jpg" alt="Raph Levien is shown at the right presenting his talk and on the left the slide showing a capital R glyph in a grid structure" width="50%"></a>
</div>
</div>
</li>
<li>
<p>I once wrote <a href="https://lukas-prokop.at/articles/2023-03-25-icu4x">a blog article about icu4x 1.0</a> when I started using it. I am using icu4x in the <a href="https://github.com/typho/opstr">opstr project</a>. <a href="https://rustweek.org/talks/shane/">Shane spoke about Beyond ICU4X 2.0 and future goals</a>. I promised him to provide a review of icu4x 2.0 as well. Stay tuned at this blog!</p>
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-13_rustweek2025-shane-icu4x"><img src="../assets/img/2025-05-13_rustweek2025-shane-icu4x.jpg" alt="Shane on the right of the picture is pointing towards the slides on the left and the slides say “i18n is portable lightweight and secure”" width="50%"></a>
</div>
</div>
</li>
</ul>
</div>
<div class="paragraph">
<p>There are so many more talks to talk about. And simultaneously it is important to talk to the people. So many nice people with aspirations and dreams for the programming language. And I did not mention how much I enjoyed to resolve some rust issues in my head during the conference. A lot of kudos to my work colleague who is following core development efforts and helped me out several times. By the way, I have identified 4 Austrians in total at the conference.</p>
</div>
<div class="paragraph">
<p>On the day after the talks, <a href="https://blog.rust-lang.org/2025/05/15/Rust-1.87.0/">rust 1.87 was released live from Utrecht</a>. As I mentioned, we only participated on the two days and I left by night train towards Linz on Thursday evening.</p>
</div>
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-14_rustweek2025-community.jpg"><img src="../assets/img/2025-05-14_rustweek2025-community.jpg" alt="the picture shows a conference venue hall filled with about 80 people chatting to each other" width="100%"></a>
</div>
<div class="title">Figure 6. Conference venue hall</div>
</div>
<div class="imageblock center">
<div class="content">
<a class="image" href="../assets/img/2025-05-14_rustweek2025-final-photo"><img src="../assets/img/2025-05-14_rustweek2025-final-photo.jpg" alt="RustWeek organizers gathered at the front stage in a cinema room to take a final conference photo together" width="100%"></a>
</div>
<div class="title">Figure 7. Final photo by the RustWeek organizers</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="conclusion">Conclusion</h2>
<div class="sectionbody">
<div class="paragraph">
<p>I absolutely enjoyed Utrecht. I absolutely enjoyed the conference. I was astonished how well-organized the conference was by the RustNL community. So I want to thank them a lot and hope for a wonderful conference for them in 2026 as well!</p>
</div>
</div>
</div>]]></content><author><name>Lukas Prokop</name></author><category term="work" /><category term="reflection" /><category term="community" /><category term="rustlang" /><summary type="html"><![CDATA[Motivation]]></summary></entry><entry xml:lang="de"><title type="html">Aikidovereine in Graz und Salzburg</title><link href="https://lukas-prokop.at/articles/2025-04-13-aikido-vereine" rel="alternate" type="text/html" title="Aikidovereine in Graz und Salzburg" /><published>2025-04-13T00:00:01+02:00</published><updated>2025-05-25T12:07:24+02:00</updated><id>https://lukas-prokop.at/articles/2025-04-13-aikido-vereine</id><content type="html" xml:base="https://lukas-prokop.at/articles/2025-04-13-aikido-vereine"><![CDATA[<div class="sect1">
<h2 id="motivation">Motivation</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Ich bin neulich von Graz nach Salzburg gesiedelt. Ich wollte meine Sammlung an Aikidovereinen teilen, die Trainingseinheiten in Graz &amp; Salzburg anbieten. Ich liste nur Trainingseinheiten, die …</p>
</div>
<div class="ulist">
<ul>
<li>
<p>auf der Webseite</p>
</li>
<li>
<p>für einen Standort der jeweiligen Stadt ein Training anbieten</p>
</li>
<li>
<p>und die sich (aus reinem Eigeninteresse) an Erwachsene richten.</p>
</li>
</ul>
</div>
</div>
</div>
<div class="sect1">
<h2 id="graz">Graz</h2>
<div class="sectionbody">
<table class="tableblock frame-all grid-all stretch">
<colgroup>
<col style="width: 33.3333%;">
<col style="width: 33.3333%;">
<col style="width: 33.3334%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Verein</th>
<th class="tableblock halign-left valign-top">Aktuelle Leitung</th>
<th class="tableblock halign-left valign-top">Einheiten</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="https://www.aikidopro.at/">Aikido PRO</a></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Valentin Lasnik</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Di 16:30, Fr 18:00</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="https://aikikai-graz.at/">Aikikai Graz</a></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Peter Poltsch</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Di 18:30, Mi 17:00, Do 20:00, Fr 18:00</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="https://www.aikidograz.at/">ASKÖ Aikido Graz</a></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Günther Steger</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Di 19:00, Do 19:30, Fr 17:00</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="https://www.aikido-graz.at/">Aikido Union Graz</a></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Frank Koren</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Mo 19:00, Mi 18:30</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="http://www.sobukan.at/">Sobukan Union Graz</a></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Andreas Schoch</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Do 19:00</p></td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="sect1">
<h2 id="salzburg">Salzburg</h2>
<div class="sectionbody">
<table class="tableblock frame-all grid-all stretch">
<colgroup>
<col style="width: 33.3333%;">
<col style="width: 33.3333%;">
<col style="width: 33.3334%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Verein</th>
<th class="tableblock halign-left valign-top">Aktuelle Leitung</th>
<th class="tableblock halign-left valign-top">Einheiten</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="https://aikido-salzburg.at/">Ko Jun Dojo - Aikido Union Salzburg</a></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Bruno Wintersteller</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Mo 19:30, Do 19:00, Sa 10:00</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="https://www.enshiro.at/">Enshiro Dojo Salzburg</a></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Harald Paßrucker</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Mo 19:30, (Fr 20:00)</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="https://www.tanden-aikido.at/">Tanden Dojo Salzburg</a></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Alexander Ermakov</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Mo 20:15, Mi 20:15</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="http://www.kenbukai-salzburg.at/">Kenbukai Salzburg</a></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Ute Schwarzmayr</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Di 19:00, Fr 08:00</p></td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="sect1">
<h2 id="zusammenfassung">Zusammenfassung</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Wenn ich externen Leuten von Aikidō berichte und es als Nischensport oder -kunst beschreibe, sind sie immer überrascht wieviele Vereine es trotzdem gibt. Die Liste oben habe ich nur durch Internetrecherche gefunden und vielleicht habe ich sogar welche übersehen. In diesem Sinne ist Aikidō wirklich lebhaft.</p>
</div>
</div>
</div>]]></content><author><name>Lukas Prokop</name></author><category term="sports" /><category term="Aikidō" /><summary type="html"><![CDATA[Motivation]]></summary></entry><entry xml:lang="en"><title type="html">syntok release</title><link href="https://lukas-prokop.at/articles/2024-12-19-syntok-release" rel="alternate" type="text/html" title="syntok release" /><published>2024-12-19T00:01:00+01:00</published><updated>2025-01-16T00:40:12+01:00</updated><id>https://lukas-prokop.at/articles/2024-12-19-syntok-release</id><content type="html" xml:base="https://lukas-prokop.at/articles/2024-12-19-syntok-release"><![CDATA[<div class="sect1">
<h2 id="motivation">Motivation</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Assume you have a program to understand desired markup languages. Assume you have a typesetting engine. Assume you have components to generate desired output formats like PDF, EPUB, HTML5, and so on. By combining these tools, you get a program for your digital typesetting needs, right?</p>
</div>
<div class="paragraph">
<p>No, you will soon recognize that syntax highlighting is a crucial element of such systems. Programmers want their generated documents to feature syntax highlighting. Indeed, source code is often terrible to read without syntax highlighting. Can one easily distinguish types from identifiers? Can one identify substructures if the syntax does not require an <a href="https://en.wikipedia.org/wiki/Off-side_rule">off-side rule</a>? One could certainly pull in one of the many syntax highlighting library efforts, but isn&#8217;t this overkill and a too strong dependency?</p>
</div>
<div class="paragraph">
<p>As a result, I thought about a building block. A serialization format which encodes how syntax shall be serialized.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="syntok">Syntok</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Let me introduce <em>syntok</em>: serialized tokenization of syntax.</p>
</div>
<div class="paragraph">
<p>Consider the following example C++ program:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="pygments highlight" style="background: #f8f8f8;"><code data-lang="cpp"><span></span><span style="color: #9C6500">#include</span><span style="color: #bbbbbb"> </span><span style="color: #3D7B7B; font-style: italic">&lt;iostream&gt;</span>

<span style="color: #B00040">int</span><span style="color: #bbbbbb"> </span><span style="color: #0000FF">main</span>()<span style="color: #bbbbbb"> </span>{
<span style="color: #bbbbbb">	</span>std<span style="color: #666666">::</span>cout<span style="color: #bbbbbb"> </span><span style="color: #666666">&lt;&lt;</span><span style="color: #bbbbbb"> </span><span style="color: #BA2121">&quot;hello &quot;</span><span style="color: #bbbbbb"> </span><span style="color: #666666">&lt;&lt;</span><span style="color: #bbbbbb"> </span>([](<span style="color: #B00040">void</span>){<span style="color: #bbbbbb"> </span><span style="color: #008000; font-weight: bold">return</span><span style="color: #bbbbbb"> </span><span style="color: #BA2121">&quot;world!&quot;</span>;<span style="color: #bbbbbb"> </span>})()<span style="color: #bbbbbb"> </span><span style="color: #666666">&lt;&lt;</span><span style="color: #bbbbbb"> </span>std<span style="color: #666666">::</span>endl;
<span style="color: #bbbbbb">	</span><span style="color: #008000; font-weight: bold">return</span><span style="color: #bbbbbb"> </span><span style="color: #666666">0</span>;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>In my weblog, this source code appears colorized (also called “syntax highlighted”) to make it more readable. This is possible because a ruby binding to <a href="https://pygments.org/">pygments</a> is utilized to generate a colorized HTML version from the code snippet. But wait… isn&#8217;t this annoying? I need a ruby binding to run some python software to generate HTML output?! Or in the case of <a href="https://tree-sitter.github.io/tree-sitter/">tree-sitter</a>, I need C and JavaScript to generate HTML or XML output. Would it not be nice to just take a file which encodes the individual tokens and the software can decide the remaining colorization parts? And if pygments and tree-sitter can emit these tokens, we can use them interchangably.</p>
</div>
<div class="paragraph">
<p>Thus, instead of one tool covering the entire pipeline of reading some syntax and generating some specific output format, I want to split the pipeline up. One tool reads syntax and generates a syntok file. One tool reads the syntok file and generates the output format.</p>
</div>
<div class="paragraph">
<p>For the example above, the following file can be the corresponding syntok file:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="pygments highlight" style="background: #f8f8f8;"><code data-lang="xml"><span></span><span style="color: #9C6500">&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt;</span>
<span style="color: #008000; font-weight: bold">&lt;syntok</span><span style="color: #bbbbbb"> </span><span style="color: #687822">xmlns=</span><span style="color: #BA2121">&quot;https://spec.typho.org/syntok/1.0/xml-schema&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;preprocessor-instruction&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;0&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;8&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>#include<span style="color: #bbbbbb"> </span><span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;system-library-ref&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;9&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;18&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span><span style="color: #717171; font-weight: bold">&amp;lt;</span>iostream<span style="color: #717171; font-weight: bold">&amp;gt;</span><span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;whitespace&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;19&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;20&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>

<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;type&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;21&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;23&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>int<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;whitespace&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;24&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;24&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span><span style="color: #bbbbbb"> </span><span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;identifier&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;25&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;28&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>main<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;parameter-list&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;29&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;30&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>()<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;operator&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;31&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;34&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span><span style="color: #bbbbbb"> </span>{
<span style="color: #bbbbbb">        </span><span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;namespace&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;35&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;37&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>std<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;operator&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;38&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;39&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>::<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;identifier&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;40&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;43&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>cout<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;operator&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;44&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;48&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span><span style="color: #bbbbbb"> </span><span style="color: #717171; font-weight: bold">&amp;lt;&amp;lt;</span><span style="color: #bbbbbb"> </span>&quot;<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;string&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;49&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;54&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>hello<span style="color: #bbbbbb"> </span><span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;operator&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;55&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;63&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>&quot;<span style="color: #bbbbbb"> </span><span style="color: #717171; font-weight: bold">&amp;lt;&amp;lt;</span><span style="color: #bbbbbb"> </span>([](<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;type&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;64&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;67&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>void<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;operator&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;68&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;70&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>){<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;keyword&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;71&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;77&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span><span style="color: #bbbbbb"> </span>return<span style="color: #bbbbbb"> </span><span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;string&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;78&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;85&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>&quot;world!&quot;<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;operator&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;86&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;95&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>;<span style="color: #bbbbbb"> </span>})()<span style="color: #bbbbbb"> </span><span style="color: #717171; font-weight: bold">&amp;lt;&amp;lt;</span><span style="color: #bbbbbb"> </span><span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;namespace&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;96&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;98&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>std<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;operator&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;99&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;100&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>::<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;identifier&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;101&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;104&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>endl<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;operator&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;105&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;105&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>;<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;keyword&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;106&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;114&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>
<span style="color: #bbbbbb">        </span>return<span style="color: #bbbbbb"> </span><span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;integer&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;115&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;155&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>0<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #bbbbbb">  </span><span style="color: #008000; font-weight: bold">&lt;item</span><span style="color: #bbbbbb"> </span><span style="color: #687822">category=</span><span style="color: #BA2121">&quot;operator&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">start=</span><span style="color: #BA2121">&quot;116&quot;</span><span style="color: #bbbbbb"> </span><span style="color: #687822">end=</span><span style="color: #BA2121">&quot;118&quot;</span><span style="color: #008000; font-weight: bold">&gt;</span>;
}<span style="color: #008000; font-weight: bold">&lt;/item&gt;</span>
<span style="color: #008000; font-weight: bold">&lt;/syntok&gt;</span></code></pre>
</div>
</div>
<div class="paragraph">
<p>syntok is an XML file (file extension <code>.synt</code>) which has a root element syntok and contains item elements for the individual tokens. <code>start</code> and <code>end</code> document the byte offsets and crucially <code>category</code> associates a category to this token. Now, I would like to point out two obvious points:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The set of categories (“data model”) can be selected by the tokenizer itself. For a short period of time in the beginning, I assumed I can contribute a data model for tokenization. For example, remember that many syntaxes do not have a “namespace” category but introduce arbitrary other synactic elements (e.g. python). It is impossible to contribute such a generic taxonomy. Instead I have to rely upon a folksonomy.</p>
</li>
<li>
<p>In general, the category-to-syntax-highlighting-color association needs to be contributed externally. But of course, one can trivially just hash the category name and pick a color based on the hash (this is what I did in my example programs … and certainly it does not always lead to beautiful colorization!).</p>
</li>
<li>
<p>The quality of tokenization is allowed to vary. What about the final operator-categorized item? Why is ';' and '}' not split up with a whitespace-categorized item? Simply put, because for 99% of applications, whitespace won&#8217;t have a special style (e.g. different background color). So the given quality suffices. And indeed, a better tokenizer hopefully splits them up to satisfy even more applications.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>One of the immediate advantages is that a tool can generate the syntok file, but now the user can intervene and adjust the tokenization (or add additional markup) before it gets represented. This solves a common difficulty I experienced from the LaTeχ package world. If my source code has slight adjustments (often happens with ASM, happened when python highlighting did not yet have python3 support, SQL versus PL/SQL, …), the software will irrevocably represent erroneous syntax.</p>
</div>
<div class="paragraph">
<p>One of the requirements is that the entire file is tokenized. So the <code>start</code> and <code>end</code> attribute provide a partition<sup class="footnote">[<a id="_footnoteref_1" class="footnote" href="#_footnotedef_1" title="View footnote.">1</a>]</sup>. Recognize that the syntax is linear and flat. It does not represent hierarchical structure often found in markup languages and programming languages.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="the-specification">The specification</h2>
<div class="sectionbody">
<div class="paragraph">
<p>The specification document was written in AsciiDoc and is readable in this git repository:</p>
</div>
<div class="paragraph">
<p><a href="https://github.com/typho/syntok">syntok</a></p>
</div>
<div class="paragraph">
<p>Furthermore, it comes with a bunch of tools, I used while using the standard in production.
Most importantly:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>A <a href="https://github.com/typho/syntok/blob/main/tools/syntok-colorize-cli.py">python script</a> taking a <code>.synt</code> file to generate colorized CLI output</p>
</li>
<li>
<p>A <a href="https://github.com/typho/syntok/blob/main/tools/syntok-colorize-web.html">JavaScript-powered webpage</a> taking a <code>.synt</code> file to generate colorized output on an HTML page</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Furthermore:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>An <a href="https://github.com/typho/syntok/blob/main/tools/tools/validate-syntok-v1.xsd">XSD file</a> to verify some properties of a syntok file</p>
</li>
<li>
<p>A <a href="https://github.com/typho/syntok/blob/main/tools/tools/validate-syntok-v1.py">python script</a> to verify remaining properties of a syntok file</p>
</li>
<li>
<p>A <a href="https://github.com/typho/syntok/blob/main/tools/template-by-unicode-categories.py">python script</a> to generate syntok template by Unicode categories</p>
</li>
<li>
<p>A <a href="https://github.com/typho/syntok/blob/main/tools/tree-sitter-to-syntok.py">python script</a> taking a tree-sitter dump and the original file to generate the syntok file</p>
</li>
</ul>
</div>
</div>
</div>
<div class="sect1">
<h2 id="f-a-q">F.A.Q.</h2>
<div class="sectionbody">
<div class="dlist">
<dl>
<dt class="hdlist1">Why XML?</dt>
<dd>
<p>I think JSON and XML have the broadest support as data serialization formats to be written and read. YAML never gained sufficient traction (without reciting the reasons here). I lookad at XML and JSON and recognized that writing XML is much simpler because of simple escaping rules. Recognize that the user-provided content can be arbitrary (even binary) and in these cases, I would not dare to write my own JSON writer in C or assembly, but I would to do so for XML (in fact, I did back at university).</p>
</dd>
<dt class="hdlist1">Why didn&#8217;t you allow both formats?</dt>
<dd>
<p>There was one unpublished version specifying JSON as well as XML serialization. In the end, I felt like this fragments the topic unnecessarily and makes it difficult for tooling providers.</p>
</dd>
<dt class="hdlist1">Why document start/end?</dt>
<dd>
<p>Since people often ignore that XML is whitespace-sensitive, I think it can easily happen that someone introduces content accidentally. Then the original content gets lost. When someone wants to debug this situation, having the start/end attributes, helps a lot. I admit, it makes it more difficult to adjust syntok file manually.</p>
</dd>
</dl>
</div>
</div>
</div>
<div class="sect1">
<h2 id="conclusion">Conclusion</h2>
<div class="sectionbody">
<div class="paragraph">
<p>I hope this standard contributes to ease syntax highlighting in digital typesetting. Of course, it needs proper support by syntax highlighting libraries or, even better, by parser authors.</p>
</div>
<div class="paragraph">
<p>Let syntax be tokenizable!</p>
</div>
</div>
</div>
<div id="footnotes">
<hr>
<div class="footnote" id="_footnotedef_1">
<a href="#_footnoteref_1">1</a>. In the mathematical sense. Colloquially this would be named “complete”.
</div>
</div>]]></content><author><name>Lukas Prokop</name></author><category term="cs" /><category term="software-development" /><category term="digital-typesetting" /><summary type="html"><![CDATA[Motivation]]></summary></entry></feed>