Papers
I read a bunch of papers. This peaked during my PhD studies, but I enjoyed good papers before and after that. I systematically review and summarize them. Here are my notes.
Table of contents:
- § A Masked Ring-LWE Implementation
- § A Practical Analysis of Rust’s Concurrency Story
- § A Provably Secure True Random Number Generator with Built-In Tolerance to Activ…
- § A Replication Study on Measuring the Growth of Open Source
- § A Side-channel Resistant Implementation of SABER
- § A Sound Method for Switching between Boolean and Arithmetic Masking
- § A system for typesetting mathematics
- § Additively Homomorphic Ring-LWE Masking
- § Aggregated Private Information Retrieval
- § BasicBlocker: Redesigning ISAs to Eliminate Speculative-Execution Attacks
- § Benchmarking Post-quantum Cryptography in TLS
- § Crouching error, hidden markup [Microsoft Word]
- § Cryptanalysis of ring-LWE based key exchange with key share reuse
- § Cryptographic competitions
- § Cyclone: A safe dialect of C
- § Detecting Unsafe Raw Pointer Dereferencing Behavior in Rust
- § EWD1300: The Notational Conventions I Adopted, and Why
- § Engineering a sort function
- § Everything Old is New Again: Binary Security of WebAssembly
- § FastSpec: Scalable Generation and Detection of Spectre Gadgets Using Neural Emb…
- § High-speed Instruction-set Coprocessor for Lattice-based Key Encapsulation Mech…
- § Historical Notes on the Fast Fourier Transform
- § How Usable are Rust Cryptography APIs?
- § In defense of PowerPoint
- § Languages and the computing profession
- § McBits: Fast Constant-Time Code-Based Cryptography
- § NTRU: A ring-based public key cryptosystem
- § New directions in cryptography
- § Number "Not" Used Once - Key Recovery Fault Attacks on LWE Based Lattice Crypto…
- § On the Security of Password Manager Database Formats
- § On the criteria to be used in decomposing systems into modules
- § PDF/A considered harmful for digital preservation
- § Piret and Quisquater’s DFA on AES Revisited
- § Power analysis attack on Kyber
- § Practical CCA2-Secure and Masked Ring-LWE Implementation
- § Practical Evaluation of Masking for NTRUEncrypt on ARM Cortex-M4
- § Region-Based Memory Management in Cyclone
- § SEVurity: No Security Without Integrity
- § SOK: On the Analysis of Web Browser Security
- § Scribble: Closing the Book on Ad Hoc Documentation Tools
- § Seven great blunders of the computing world
- § Software-based Power Side-Channel Attacks on x86
- § Some instructive mathematical errors
- § Templates vs. Stochastic Methods: A Performance Analysis for Side Channel Crypt…
- § The Aesthetics of Reading
- § The Case of Correlatives: A Comparison between Natural and Planned Languages
- § The European Union and the Semantic Web
- § The Implementation of Lua 5.0
- § The Profession as a Culture Killer
- § The Security Risk of Lacking Compiler Protection in WebAssembly
- § The UNIX Time- Sharing System
- § The design of a Unicode font
- § The problem with unicode
- § Too Much Crypto
- § Toward decent text encoding
- § Tweaks and Keys for Block Ciphers: The TWEAKEY Framework
- § Underproduction: An Approach for Measuring Risk in Open Source Software
- § Vnodes: An Architecture for Multiple File System Types in Sun UNIX
- § When a Patch is Not Enough - HardFails: Software-Exploitable Hardware Bugs
- § You Really Shouldn't Roll Your Own Crypto: An Empirical Study of Vulnerabilitie…
- § π is the Minimum Value for Pi
Papers and notes
A Masked Ring-LWE Implementation §
Title: “A Masked Ring-LWE Implementation” by Oscar Reparaz, Sujoy Sinha Roy, Frederik Vercauteren, Ingrid Verbauwhede [url] [dblp]
Published in 2015 at CHES 2015 and I read it in 2020/07
Abstract: Lattice-based cryptography has been proposed as a postquantum public-key cryptosystem. In this paper, we present a masked ring-LWE decryption implementation resistant to first-order side-channel attacks. Our solution has the peculiarity that the entire computation is performed in the masked domain. This is achieved thanks to a new, bespoke masked decoder implementation. The output of the ring-LWE decryption are Boolean shares suitable for derivation of a symmetric key. We have implemented a hardware architecture of the masked ring-LWE processor on a Virtex-II FPGA, and have performed side channel analysis to confirm the soundness of our approach. The area of the protected architecture is around 2000 LUTs, a 20 % increase with respect to the unprotected architecture. The protected implementation takes 7478 cycles to compute, which is only a factor ×2.6 larger than the unprotected implementation.
ambiguity
\[ \operatorname{th}(x) = \begin{cases} 0 & \text{if } x \in (0, q/4) \cup (3q/4, q) \\ 1 & \text{if } x \in (q/4, 3q/4) \end{cases} \]
The intervals apparently should denote inclusive-exclusive boundaries.
error
On page 686, c1 and c2 is swapped erroneously all the time. On page 685, c2 is defined such that it contains the message. Consequently, factor r2 must be applied to c1, not c2. However, page 686 repeatedly multiplies c2 to the different r values.
Masked Decoder table
Sorry, I don't have table support in Zotero.
First line is meant to be read as “If a' is in quadrant (1) and a'' is in quadrant (1), then a is inconclusive (∅) unlike bit 0 or 1”
a' a'' a
1 1 ∅
1 2 1
1 3 ∅
1 4 0
2 1 1
2 2 ∅
2 3 0
2 4 ∅
3 1 ∅
3 2 0
3 3 ∅
3 4 1
4 1 0
4 2 ∅
4 3 1
4 4 ∅
quotes
- “The area of the protected architecture is around 2000 LUTs, a 20 % increase with respect to the unprotected architecture. The protected implementation takes 7478 cycles to compute, which is only a factor ×2.6 larger than the unprotected implementation.”
- “So far, the reported implementations have focused mainly on efficient implementation strategies, and very little research work has appeared in the area of side channel security of the lattice-based schemes.”
- “Most notably, masking is both a provably sound and popular in industry.”
- “However, there are not many masking schemes specifically designed for postquantum cryptography. In Brenner et al. present a masked FPGA implementation of the post-quantum pseudo-random function SPRING.”
- “In the rest of the paper, we focus on protecting the ring-LWE decryption operation against side-channel attacks with masking. The decryption algorithm is considerably exposed to DPA attacks since it repeatedly uses long-term private keys. In contrast, the encryption or key-generation procedures use ephemeral secrets only”
- “We analyze the error rates of the decryption operation in Sect. 6 and apply error correcting codes.”
- “the message is first lifted to a ring element m̄ ∈ R q by multiplying the messagebits by q/2.”
- “The most natural way to split the computation of the decryption as Eq. 2 is to split the secret polynomial r additively into two shares r' and r'' such that r[i] = r'[i] + r''[i] (mod q) for all i.”
- “The final threshold th(·) operation of Eq. 2 is obviously non-linear in the base field Fq, and hence cannot be independently applied to each branch”
- “For instance, in [4] an approach based on masked tables was used.”
- “We design a bespoke masked decoder that results in a very compact implementation.”
- “a denotes a single coefficient and (a', a'') its shares such that a' + a'' = a (mod q).”
- “In roughly half of the cases, we can apply one of the 8 rules previously described to deduce the value of th(a).”
- “a' ← a' + Δ1 and a'' ← a'' − Δ1 for certain Δ1”
- “See the extended version of this paper for exemplary values of Δi.” → http://www.reparaz.net/oscar/ches2015-lwe → 404 Not Found
- Masked Table Lookup: “This is a well-studied problem that arises in other situations (for instance, when masking the sbox lookup in a typical block cipher) and there are plenty of approaches here to implement such masked table lookup. We opted for the approach of masked tables as in [26].”
- “The usual precautions are applied when implementing f. For our target FPGA platform, we carefully split the 7-bit input to 1-bit output function f into a balanced tree of 4-bit input LUTs, in such a way that any intermediate input or output of LUTs does not leak in the first order.”
- “In Table 1, we can see that the proposed masking of the ring-LWE architecture incurs an additional area overhead of only 301 LUTs and 129 FFs in comparison to the unprotected version.”
- “we could straightforward reduce the additional area cost by reusing the 13-bit addition and subtraction circuits present in the arithmetic coprocessor. […] For simplicity, we did not implement this approach.”
- “Thus in total, a masked decryption operation requires 7478 cycles. The arithmetic coprocessor and the masked decoder run in constant time and constant flow.”
- “We point out that the approach laid out in Sect. 3 scales quite well with the security order. To achieve security at level d+1, one would need to split the computation of Eq. 2 into d branches analogously to Eq. 3.”
- “We provide a very advantageous setting for the adversary: we assume that the evaluator knows the details about the implementation (for example, pipeline stages).”
- “The evaluation methodology to test if the masking is sound is as follows. We first proceed with first-order key-recovery attacks when the randomness source (PRNG) is switched off. […] Then we switch on the PRNG and repeat the attacks. If the masking is sound, the first-order attacks shall not succeed. In addition, we perform second-order attacks to confirm that the previous first-order analyses were carried out with enough traces.”
- “We can see that starting from ≈2000 measurements this second-order attack is successful.”
- “We remark that the relatively low number of traces required for the second-order attack is due to the very friendly scenario for the evaluator. The platform is low noise and no other countermeasure except than masking was implemented.”
summary
Nice paper. It is sad that c1 and c2 got mixed up on page 686. The idea to mask is indeed natural with r2 = r2' + r2'' (mod q) and a' := a' + Δ1 for the decoder a'' := a'' - Δ1. Isn't sampling also a problem when masking ring-LWE? If so, the title is inappropriate and should be “A Masked Ring-LWE Decoder Implementation”. The described implementation works for CPA-only. It gives a factor of 2.679 in terms of CPU cycles and 3.2 in terms of runtime in microseconds in the decryption step. With N=16 (maximum number of iterations in the decoder), you get 1 error in about 400 000 bits. This means in NewHope's 256 bit encapsulated keys, you get 1 error in 1420 messages. I did not understand why the masked decoder requires the value of the previous iteration (loop in Figure 3) for a long time. Then I recognized. I don't like the fact that the source code was not published with the paper (→ reproducible research).
typo
- “In our implementation, N = 16 iterations produces a satisfactory result.”
- “essentially maps the output of each quadrant qi' and qi'' (2 bits each) after the i-the iteration”
A Practical Analysis of Rust’s Concurrency Story §
Title: “A Practical Analysis of Rust’s Concurrency Story” by Aditya Saligrama, Andrew Shen, Jon Gjengset [url] [dblp]
Published in 2019 at and I read it in 2021-11
Abstract: Correct concurrent programs are difficult to write; when multiple threads mutate shared data, they may lose writes, corrupt data, or produce erratic program behavior. While many of the data-race issues with concurrency can be avoided by the placing of locks throughout the code, these often serialize program execution, and can significantly slow down performance-critical applications. Programmers also make mistakes, and often forget locks in less-executed code paths, which leads to programs that misbehave only in rare situations.
quotes
- “In this work, we examine how this aspect of Rust’s type system impacts the development and refinement of a concurrent data structure, as well as its ability to adapt to situations when correctness is guaranteed by lower-level invariants (e.g., in lock-free algorithms) that are not directly expressible in the type system itself. We detail the implementation of a concurrent lock-free hashmap in order to describe these traits of the Rust language. Our code is publicly available at https://github.com/saligrama/concache and is one of the fastest
concurrent hashmaps for the Rust language, which leads to mitigating bottlenecks in concurrent programs.” - “The primary cause of headache in concurrent programs is data races. A data race occurs when two threads attempt to read and write the same memory location at the same time.”
- “Lock-free data structures are data structures that have been designed specifically for concurrent access. Operations on such data structures do not generally need to take locks, and can
proceed concurrently even when there are many readers and many writers.” - “Code that is marked as unsafe is allowed to alias and typecast pointers (though it must be valid Rust in every other way), which is sufficient to implement any concurrent algorithm.”
- “[…] we implemented several concurrent hashmap designs of increasing sophistication and evaluated their performance. We started out using the hashmap provided by the standard library, wrapped in a reference-counted reader-writer lock, which, as expected, exhibited poor multi-core scalability.”
- “In Rust, ‘safety’ is generally defined as not being susceptible to undefined behavior, which occurs when compilers make certain assumptions that are not satisfied during execution.”
- “In Rust, every variable is owned by some scope. When a scope ends, it is responsible for cleaning up any resources used by the variables that it owns.”
- “Each variable can only have one owner at a given time, but ownership can be passed to other scopes through function calls or returns.”
- “Rust allows the owner of a variable to give out temporary references to a variable (this is called borrowing the variable).”
- “While the borrow checker (the part of the compiler that checks that all references are valid) guarantees that there are no data-races, additional mechanisms are needed to ensure that multithreaded programs behave correctly.”
- “Rust also has some types that provide interior mutability : certain types allow you to modify a variable even if you only have an immutable reference to it. For example, the Cell type allows you to swap the value of a variable through a &Cell , which is safe as long as the Cell is only accessed from a single thread.”
- “A type that is Send can safely be sent to another thread (that is, its ownership can be passed across thread boundaries), whereas a type that is Sync can safely be accessed from another thread (that is, a reference to it can be passed across thread boundaries). A type whose members are all Send is itself Send , and the same applies to Sync . A non-atomic reference-counted variable (Rc in the standard library) is neither Send nor Sync, whereas a Cell is Send but not Sync . Rust requires code that is spawned on a new
thread to be Send , which ensures that threads are not able to access shared data unless that
data is contained in a structure that allows concurrent access.” - “A code block that is marked as unsafe is allowed to create raw pointers — pointers without an associated lifetime — and cast them to different types, or back to regular references. This allows the developer to maintain multiple mutable pointers to the same data, and expose them as mutable references when their manually-checked invariants indicate that doing so is safe. This is necessary to implement, e.g., a lock, which exposes a mutable reference only when it has checked that there is no-one else holding the lock.”
- “Users of a library that contains unsafe code do not have to mark their own code as unsafe; the library authors effectively promise that their library provides a safe external interface.”
- “Conversely, in other languages such as Go, where the lock is separate from the inner type and threads are not forced to take the lock to access data.”
- “It is also very useful as many concurrent algorithms are written in C-style pseudo code making it easier to port these algorithms into Rust.”
- “When writing concurrent code, you often need to temporarily violate the Rust safety restrictions, or guarantee them through invariants that the compiler cannot check (e.g., multiple mutable pointers to the same data, or pointer manipulations).”
- “Conversely, auto-free tends to be somewhat difficult when writing unsafe code as we must be careful to not accidentally drop a temporarily owned item (in our case, through Box::from_raw).”
- via mem::forget Rust's safety guarantees don't include a guarantee that destructors are always run
- “We observed a tendency to overuse unwrap() while prototyping as it was easier to do so, shortening the length of the code.”
- “While we recognize that robust error handling is difficult to implement, we would appreciate shorter and more efficient ways to transition from prototype to final code in terms of error handling.”
- “We would ideally like to see a compiler feature, perhaps in cargo lint, that could detect if atomics are not necessary (i.e., accessed by only one thread).”
- fn foo(p: &mut *mut usize)
“This parameter is fairly confusing and it can be difficult to determine the purpose of the mut, *, and &. For a beginner with little experience with Rust’s with Rust's pointer and reference types, many questions arise with this sort of function parameter. What is the difference between the two mut s? How does this type correspond to variable p ? The overall complexity of the parameter creates confusion and can make it difficult to read and write Rust function parameters.” - “When writing our concurrent hashmap, we did not use any other Ordering except Ordering::SeqCst, in part because the meaning of the different orderings were not clear to us. It is difficult to understand the implications of any particular Ordering, even after reading the documentation.”
summary
A nice report about the implementation of a concurrent hashmap in rust. The basic primitives of rust are well-explained, but the lock-free design of the hash map is not discussed at all. The relative performance characteristics are presented in a figure without appropriate caption and nice examples are provided for struggles with rust. Overall a nice bachelor/master-level project outcome.
- Performance charts should mention “higher is better” and should be better placed
- The examples (e.g. auto-free, unhelpful compiler errors) are very good!
- println!() in get_val should be eprintln!()
- It is pretty severe for an academic paper to skip the semantics of memory orderings if they are “difficult to understand”
- I couldn't understand why there are two search functions in the section “unnecessary use of lifetimes”
A Provably Secure True Random Number Generator with Bu… §
Title: “A Provably Secure True Random Number Generator with Built-In Tolerance to Active Attacks” by Berk Sunar, William Martin, Douglas Stinson [url] [dblp]
Published in 2007-01 at and I read it in 2021-11-08
Abstract: This paper is a contribution to the theory of true random number generators based on sampling phase jitter in oscillator rings. After discussing several misconceptions and apparently insurmountable obstacles, we propose a general model which, under mild assumptions, will generate provably random bits with some tolerance to adversarial manipulation and running in the megabit-persecond range. A key idea throughout the paper is the fill rate, which measures the fraction of the time domain in which the analog output signal is arguably random. Our study shows that an exponential increase in the number of oscillators is required to obtain a constant factor improvement in the fill rate. Yet, we overcome this problem by introducing a postprocessing step which consists of an application of an appropriate resilient function. These allow the designer to extract random samples only from a signal with only moderate fill rate and, therefore, many fewer oscillators than in other designs. Last, we develop fault-attack models and we employ the properties of resilient functions to withstand such attacks. All of our analysis is based on rigorous methods, enabling us to develop a framework in which we accurately quantify the performance and the degree of resilience of the design.
clarifications and questions
- “Moreover, the entire motivation for the relatively prime ring length model is unclear.”
But the motivation was provided above! Namely,
“[…] It has been proposed that, in order to fill as many urns as possible and in order to make the behavior of
the r rings as independent as possible, the ring lengths n1, …, nr should be pairwise relatively prime integers.” - “Thus, this feature of the model is not only impractical, but its value to the model is questionable to begin with.” ⇒ why questionable? (too little justification)
- It would be nice to see variable d in section 5 be defined explicitly.
- The actual argument against relatively prime ring lengths can be found at the start of section 6.2
- r ≈ N log N … here (in section 6.3) log grows slower than reciprocal sum by about 0.57
- Coupon collector's problem:
“Let N>0 be the number of urns. Let 0<p<1 be the confidence level. Determine minimum number of rings such that at least N are filled with probability ≥ p” - After Theorem 1, N log2 N is suddenly mentioned with log base 2 which used to be the natural logarithm?! (making the error of the estimate larger)
quotes
- “Good TRNG design rests on the quality of three components: Entropy Source […] Harvesting Mechanism […] Postprocessing”
- “A postprocessor may be as simple as a von Neumann corrector [4] or may be as complicated as an extractor function [1] or a one-way hash function such as SHA-1 [4]. One should scrutinize postprocessors which modify the output conditionally on its statistical properties. There is great danger in deterministic methods aimed at improving the ‘appearance of randomness.’”
- “Inspired by these works and observations, we develop a set of requirements for our TRNG design:
- The design should be purely digital (no analog components are allowed).
- The harvesting mechanism should be simple (easy to analyze); it should preserve and optimally sample the entropy source. In other words, the unpredictability of the TRNG should be based not on the complexity of the harvesting mechanism, but only on the unpredictability of the entropy source.
- A strict mathematical justification of the entropy collection mechanism should be given, with all assumptions clearly stated and at least empirically justified. The design should be sufficiently simple to allow rigorous analysis.
- No correction circuits are allowed. Many times an adaptive correction circuit is used either to ‘adjust
the sampling frequency’ or to ‘smooth the output distribution.’ Since most such circuits use the
characteristic of the output to adaptively process the entropy source, they introduce further correlations. For instance, a correction circuit that counts the number of ones and zeros and compensates for the delay of a sampler accordingly will clearly introduce further bias to the output sequence. - Compact and efficient design (high throughput per area and energy spent). No amplifiers or other analog components are allowed, which would consume more energy and make the analysis difficult. Note that, since we are not allowing analog components, we have to sample variations in the time domain (such as the design in [8] does) rather than variations in the voltage levels. This criterion also means that we cannot use complicated post-processing schemes (e.g., SHA-1).”
- “Oscillators provide a simple and effective method to build TRNGs [5]. A simple digital oscillator may be built by chaining an odd number of inverter gates in a ring configuration.”
- “A practical configuration for harvesting jitter is based on the idea of sampling the output of a ring oscillator using the output of another oscillator. This configuration is commonly referred to as coupled oscillators. If the periods of the two oscillators are well matched, then it should be that, with high probability, we are sampling from the transition zones and not the deterministic part of the waveform.”
- “Unfortunately, problems arise in practical realization of coupled oscillators:
- Exactly matching the period of the two oscillators is fairly difficult and requires the use of special layout design techniques at the VLSI level.
- Due to imperfections, the two signals may drift relative to one another. This makes for very fragile TRNG designs.”
- “We take the following axiomatic approach, which, for the time being, neglects phase drift. Suppose Rj is an oscillator ring with period T = Tj. That is, at times 0, T, 2T, …, the signal is designed to switch from low to high and, at times T/2, 3T/2; 5T/2, …, the signal is designed to switch from high to low.”
- “Fortunately, the claim in our axiom is strongly supported by empirical evidence. In [2], for example, over a million oscilloscope captures (sampling at 4 Giga samples/sec) of jitter from a single oscillator ring with 83 inverters having T≈146,8ns (f=6:81 Mhz) were displayed and exhibited classical bell curve behavior.”
- “We will call these subintervals ‘urns’ because, as we shall soon see, the problem which now faces us is one well-known to probabilists who study urn models.”
- “Looking at Fig. 3, one notices that there is ‘waste’ of entropy when two rings are in transition at the same point in time. One is therefore tempted to find ways to minimize such overlap. It has been proposed that, in order to fill as many urns as possible and in order to make the behavior of the r rings as independent as possible, the ring lengths n1, …, nr should be pairwise relatively prime integers.”
- “Moreover, the entire motivation for the relatively prime ring length model is unclear.”
- “For a given number N of urns, a given fill rate 0 < f ≤ 1, and a given level of confidence 0 < p < 1, we would like to determine the minimum number r = M(N, f, p) of rings necessary so that, among the N urns, the event that at least fN are filled has probability at least p. For f = 1, this is the Coupon Collector’s Problem.”
- “we will prefer to keep the confidence p close to one while decreasing the fill rate f and we will present a rigorous postprocessing strategy in Section 7 to recover full confidence in the quality of the bits generated by the overall design.”
- “An (n, m, t)-resilient function is a function F(x1, x2, …, xn) = (y1, y2, …, ym) from ℤn2 to ℤm2 enjoying the property that, for any t coordinates i1, …, t1, for any constants a1, …, at from ℤ2 and any element y of the codomain Prob[F(x) = y | xi1 = a1, …, xit = at] = 1/2m. In the computation of this probability, all xi for i ∉ {i1, …, it} are viewed as independent random variables, each of which takes on the value 0 or 1 with probability 0.5.”
- “In more informal terms, knowledge of any t values of the input to the function does not allow one to make any better than a random guess at the output.”
- “Theorem 1 (e.g., [15]). Let G be a generator matrix for an (n, m, d)-linear code C. Define a function f: {0,1}n → {0,1}m by the rule f(x) = xGT. Then, f is an (n, m, d-1)-resilient function.”
- “[…] if one wishes to implement the resilient function in hardware, this can still be achieved efficiently. All one has to do is to implement a vector times a (constant) matrix product, as described in Theorem 1.”
- “There is a trade-off between code length and the size of the buffers we need to use in the implementation of the resilient function. A code of short length is easy to implement and requires smaller buffers, but using such a code runs a higher risk of being compromised when there is a burst of errors created by natural causes or by an intelligent attacker.”
- “Definition 2. The simple code Hm, the dual of the Hamming code Hm, is a [2m-1, m, 2m-1] linear code.”
- “A 4 bit version of a binary XOR-tree is shown on the top in Fig. 5. In the first level, the ring outputs are XORed together in a pairwise manner. If any single one of these XOR gates are faulted by the adversary, fixing its output to either a zero or one output bits, the effect may be seen as two of the ring outputs having become deterministic and, as before, the bias will be eliminated by the resilient function as much as the built-in strength parameter t permits.”
- “[…] the output lines should be sampled one after another in a sequential scheme.”
- “If our rings use 13 inverters, then experimental evidence shows that the period is roughly 25 ns and the standard deviation for the jitter random variable is σ=0.5ns [2]. So, σ=0.02T. Now, we want all samples to be within 1/4σ of the mean of some jitter event. So, with tolerance (μ-σ/4, μ+σ/4), we can say that 1 percent of the spectrum is filled with jitter for each ring. So, we have N = 100 urns in our combinatorial model. We note that the tolerance 14 ensures that each generated bit will yield at least 0.97 bits of entropy per sampled bit.”
- “With these parameters, the formulae developed in Section 6.4 (or last row, third entry in Table 5) tell us that r = 114 rings will be enough to fill at least 0.60N of the urns with probability at least 0.99. The output is sampled from the 114 rings and fed into a resilient function. Now, for our resilient function, we employ a [256, 16, 113]-code which is a known extended BCH code [14]. This means that the samples are grouped into blocks of 256 bits and fed into the resilient function which returns only 16 bits. The code we selected has minimum distance d = 113 and, therefore, the resilience of the associated resilient function is t = 112. With a block length of 256 and fill rate of 0.60, out of the 256 bits which go into our resilient function, (1 - 0.6) · 256 = 102.4 ≈ 103 bits bits will be deterministic. Since our resilient function can tolerate up to 112 corrupted bits, the design has an additional margin to resist additional (adversarial and nonadversarial) faults and errors of up to 9 bits.”
- “The output is 16 bits per each 256 bits sampled. These 16 bits each have 0.97 bits of entropy. Since the frequency of the circuit is 1/25 ns = 40 Mhz, this model gives us a random stream with bit rate of 40·16/256 = 2.5 Mbps, where each bit stream with bit rate of 256 carries 0.97 bits of entropy.”
summary
A very good paper. It specifies the requirements of a True Random Number Generator, builds up the theory with mathematical background and finally presents one particular instance with desired properties.
I struggled seeing the connection to linear codes which is given by Theorem 1, but I assume one just has to follow reference [15] which proves Theorem 1.
A good TRNG relies on {entropy source, harvesting mechanism, postprocessing}. The design uses ring oscillators (quantified by the urn model with prime ring lengths and the Coupon Collector's Problem), a XOR tree and a linear code namely simplex code (justified by the (n, m, t)-resilient function model).
Prior work:
- [6] amplification, requires significant power
- [4] similar
- [7] samples PLL jitter
- [8] samples LSFR and cellular automaton, complicated harvesting
- [11] metastable circuits
Test suites:
- DIEHARD [12]
- NIST test suites [13]
4 gigasamples/sec = 4·109/s corresponds to 4·109s between 2 samples = 4ns
Figure 5 shows a neat network to make the XOR tree more robust by redundancy.
A Replication Study on Measuring the Growth of Open So… §
Title: “A Replication Study on Measuring the Growth of Open Source” by Michael Dorner, Maximilian Capraro, Ann Barcomb, Krzysztof Wnuk [url] [dblp]
Published in 2022-01 at and I read it in 2022-04
Abstract: Context: Over the last decades, open-source software has pervaded the software industry and has become one of the key pillars in software engineering. The incomparable growth of open source reflected that pervasion: Prior work described open source as a whole to be growing linearly, polynomially, or even exponentially. Objective: In this study, we explore the long-term growth of open source and corroborating previous findings by replicating previous studies on measuring the growth of open source projects. Method: We replicate four existing measurements on the growth of open source on a sample of 172,833 open-source projects using Open Hub as the measurement system: We analyzed lines of code, commits, new projects, and the number of open-source contributors over the last 30 years in the known open-source universe. Results: We found growth of open source to be exhausted: After an initial exponential growth, all measurements show a monotonic downwards trend since its peak in 2013. None of the existing growth models could stand the test of time. Conclusion: Our results raise more questions on the growth of open source and the representativeness of Open Hub as a proxy for describing open source. We discuss multiple interpretations for our observations and encourage further research using alternative data sets.
quotes
- “Prior work described open source as a whole to be growing linearly, polynomially, or even exponentially.” (Dorner et al., 2022, pp. -)
- “We replicate four existing measurements on the growth of open source on a sample of 172,833 open-source projects using Open Hub as the measurement system: We analyzed lines of code, commits, new projects, and the number of open-source contributors over the last 30 years in the known open-source universe.” (Dorner et al., 2022, p. 1)
- “Open source has evolved from small communities of volunteers driven by non-monetary incentives to foundations that host large projects and support decentralized innovation among many global industries [13]” (Dorner et al., 2022, p. 2)
- “Three horizontal, longitudinal studies investigated the growth of open source as a whole: [9] from 2003, [24] from 2007, and in 2008 [11].” (Dorner et al., 2022, p. 2)
-
“In detail, the contributions of this paper are:
- a detailed discussion of the three prior studies,
- the multi-dimensional measurements using Open Hub as a measuring system to quantify the growth of open source with respect to lines of code, commits, contributors, and projects, and, thereby,
- the dependent and independent replication of the measurements by three prior studies.” (Dorner et al., 2022, p. 2)
-
“Study A [9] from 2003 at FreshMeat.net with 406 projects accounted until 2002-07-01
Study B [24] from 2007 at SourceForge.net with 4,047 projects
Study C [11] from 2008 at Ohloh.net with 5,122 projects from 1995-01-01 until 2006-12-31
from Table 1: Three prior [9, 11, 24] studies on the evolution of open source, showing data source, project sample size, and the considered data time frame.” (Dorner et al., 2022, p. 3) - “Their construction is unclear and, in consequence, we cannot replicate the measurements.” (Dorner et al., 2022, p. 4)
- “Vitality V is defined by V = R·A/L (1) where—according to description—R is the number of releases in a given period (t), A is the age of the project in days, and L is the number of releases in the period t.” (Dorner et al., 2022, p. 4)
- “However, a comment is a valuable contribution to a software project. In modern programming languages like Go or Python, comments are directly embedded in the source code for documentation purposes.” (Dorner et al., 2022, p. 7)
- “Therefore, our measurement is consistent with Hypothesis 1, which states that open source grows (in bytes).” (Dorner et al., 2022, p. 7)
- “When commit size is calculated as the total number of commits in a period, it follows a power-law distribution [27]” (Dorner et al., 2022, p. 8)
- “At the time of the data collection (2021-06-04 to 2021-06-07), Open Hub lists 355,111 open-source projects as of 2021-06-06.” (Dorner et al., 2022, p. 10)
- “After filtering our data set for the time frame from 1991-01-01 to 2020-12-31, 173,265 projects were available for further analysis.” (Dorner et al., 2022, p. 10)
- “Observation 2: Although initially growing exponentially until 2009, the growth in lines of code has continuously slowed down since 2013.” (Dorner et al., 2022, p. 13)
- “Observation 4: Although growing exponentially until 2009 and reaching its peak in March 2013 with 107,915 contributors, the number of open-source contributors has, as of 2018, decreased to the level of 2008.” (Dorner et al., 2022, p. 14)
- “We also encountered similar problems while crawling Open Hub: 181,846 of 355,111 projects do not contain information on the development activity. We also found that the accumulated number of lines added does not fit the measured lines of code. Additionally, we found a large drop in added projects in 2011 we are not able to explain (Figure 5). We speculate that Open Hub could have decreased the number of projects it adds, so that newer projects are under-represented.” (Dorner et al., 2022, p. 16)
- “In this study, we conducted a large-scale sample study on open-source projects and their cumulative growth. We analyzed the number of developers and their contributions with respect to lines of code, commits, and new projects to the open-source universe. We leveraged Open Hub as a measuring system to measure development activities of 172,833 open-source projects over the last 30 years concerning those four quantities.” (Dorner et al., 2022, p. 19)
replications
-
exact replications
- dependent: keep all conditions identical
- independent: vary one or more major aspects
summary
In this study (I read version #5), the authors replicate results from 2003 (Capiluppi, A., Lago, P., Morisio, M., “Characteristics of open source projects”), 2007 (Koch, S., “Software evolution in open source projects—a large-scale investigation”), and 2008 (Deshpande, A., Riehle, D., “The Total Growth of Open Source”) on the OpenHub project index. Open Hub lists 355,111 open-source projects as of 2021-06-06 and the development activity is available for 173,305 projects. The previous studies claimed that Open Source grows w.r.t. byte size (2003), grows quadratically (2007) or exponentially (2008) w.r.t. lines of code as well as exponentially w.r.t. projects (2008).
For the 2003 study, the authors remark “their construction is unclear and, in consequence, we cannot replicate the measurements”. The status (e.g. “beta”, “mature”) is also determined in an unclear way. The 2003 study’s lack of specification regarding the magnitude is also weak (“grows”?!).
In essence, the data sources from 2003 and 2007 are not publicly available anymore. As such the statements need to be replicated with new data for an extended timeline. The result is that the authors recognize a peak around 2010 and 2013 and a steady decline afterwards in various parameters (size in bytes, LoCs, # of projects, new projects).
The paper is well-written and the intention is clear. A little more thorough investigation around the peaks from 2010 and 2013 could be done since they occur in multiple parameters (c.f. Fig. 2 and Fig. 3) and thus seem significant. I suggest to validate against hypotheses that activities around Github or the Linux foundations influenced the way how projects are maintained. On page 16, it is mentioned that 181,846 projects do not contain information on the development activity. It should be explicitly pointed out that those are not included in the respective figures (at least this is what I assume).
The paper shows how bad the situation regarding empirical replication of data in software development is. On the other hand, it shows how it improved because version control and public availability of data improved. My personal summary is just that Open Source changed over time and since 2013 in particular.
A Side-channel Resistant Implementation of SABER §
Title: “A Side-channel Resistant Implementation of SABER” by Michiel Van Beirendonck, Jan-Pieter D'Anvers, Angshuman Karmakar, Josep Balasch, Ingrid Verbauwhede [url] [dblp]
Published in 2020 at IACR eprint 2020 and I read it in 2020-11
Abstract: The candidates for the NIST Post-Quantum Cryptography standardization have undergone extensive studies on efficiency and theoretical security, but research on their side-channel security is largely lacking. This remains a considerable obstacle for their real-world deployment, where side-channel security can be a critical requirement. This work describes a side-channel resistant instance of Saber, one of the lattice-based candidates, using masking as a countermeasure. Saber proves to be very efficient to mask due to two specific design choices: power-of-two moduli, and limited noise sampling of learning with rounding. A major challenge in masking lattice-based cryptosystems is the integration of bit-wise operations with arithmetic masking, requiring algorithms to securely convert between masked representations. The described design includes a novel primitive for masked logical shifting on arithmetic shares, as well as adapts an existing masked binomial sampler for Saber. An implementation is provided for an ARM Cortex-M4 microcontroller, and its side-channel resistance is experimentally demonstrated. The masked implementation features a 2.5x overhead factor, significantly lower than the 5.7x previously reported for a masked variant of NewHope. Masked key decapsulation requires less than 3,000,000 cycles on the Cortex-M4 and consumes less than 12kB of dynamic memory, making it suitable for deployment in embedded platforms.
open questions
- “Saber.Masked.KEM.Decaps does not require multiplication of two masked polynomials, which is a significantly more expensive computation.”
It doesn't? - “A possible countermeasure is to randomize the order of execution of these vulnerable routines. Randomness should be used to shuffle the order of operations in Saber’s multiplication or introduce dummy operations.”
Is there an actual implementation?
quotes
- “This work describes a side-channel resistant instance of Saber, one of the lattice-based
candidates, using masking as a countermeasure.” - “NIST has already announced that, in the second round, more stress will be put on implementation aspects.”
- “The security of both problems relies on introducing noise into a linear equation. However, in LWE-based schemes the noise is explicitly generated and added to the equation, while the LWR problem introduces noise through rounding of some least significant bits.”
- “In our masked implementation of Saber, we develop a novel primitive to perform masked logical shifting on arithmetic shares.”
- “Furthermore, Saber avoids excessive noise sampling due to its choice for LWR.”
- “We integrate and profile our masked CCA-secure decapsulation in the PQM4 [KRSS] post-quantum benchmark suite for the Cortex-M4, showing our close-to-ideal 2.5x overhead in CPU cycles. This factor can directly be compared to the overhead factor 5.7x reported by Oder et al., which is the work most closely related to ours, and we show that it can largely be attributed to the masking-friendly design choices of Saber.”
- “We say that a PKE is δ-correct if P[Decrypt(sk, ct) ≠ m : ct ← Encrypt(pk, m)] ≤ δ.” (i.e. smaller is better)
- “The Saber package is based on the Module Learning With Rounding (MLWR) problem, and its security can be reduced to the security of this problem. MLWR is a variant of the well known Learning With Errors (LWE) problem [Reg04], which combines a module structure as introduced by Langlois and Stehlé [LS15] with the introduction of noise through rounding as proposed by Banerjee et al. [BPR12].”
- “The additions with the constant terms h 1 , h 2 and h are needed to center the errors introduced by rounding around 0, which reduce the failure probability of the protocol.”
- “The reason being that even without side-channel information, the Saber.PKE is vulnerable to chosen-ciphertext attacks if the secret key is re-used, which was shown by Fluhrer [Flu16] to be the case for all current LWE-based and LWR-based IND-CPA secure encryption schemes.”
- “Several secure A2B as well as B2A conversion algorithms exist. These generally come in two flavours, depending on whether the arithmetic shares use a power-of-two or a prime modulus. The former group have received considerably more research interest due to their use in symmetric primitives, and they are typically more efficient and simpler to implement.”
- “Finally, Schneider et al. [SPOG19] combine the previous two algorithms, and at the same time present a new algorithm, B2Aq, which works for arbitrary moduli as well as arbitrary security orders. However, when instantiated as a power-of-two conversion, e.g. q = 28, B2Aq only outperforms [BCZ18] and [CGV14] for more than nine shares.”
- “In the remainder of this section, we first describe the Coron-Tchulkine [CT03] table-based A2B algorithm, including the fix from [Deb12].”
- “Another similarity we share with [KMRV18] is the use of the ARM Cortex-M4’s support for SIMD instructions to speed up execution.”
- “We use the Test Vector Leakage Assessment (TVLA) methodology introduced by Goodwill et al. [GJJR11] in order to validate the security of our implementation.”
- “In our experiments we use a non-specific fix vs. random test.”
- “TVLA uses the Welch’s t-test to detect differences in the mean power consumption between the two sets.”
- “After 100 000 measurements, our t-test results for Saber.Masked.KEM.Decaps with masks ON still show some slight excursions past the ±4.5 confidence boundary. This is sometimes expected for long traces, and therefore, as per [GJJR11], we conduct a second independent t-test showing that these excursions are never at the same time instant.”
- “Our most efficient design is (D), where both A2A tables and the SecBitSlicedSampler are implemented.”
- “Masking has so far received limited attention in post-quantum cryptography, but will become increasingly important in the continuation of the NIST standardization process.”
- “Oder et al. do not present the dynamic memory consumption for an unmasked design, such that we only make the masked comparison for that performance metric.”
- “From Table 5, these two A2A conversions take roughly 60,000 CPU cycles, whereas masked sampling of four error polynomials from β μ would take approximately 1,026,000 CPU cycles. The high cost of masked binomial sampling is further illustrated in [OSPG18] (Table 2), where roughly 71% of the decapsulation’s CPU cycles are spent in the masked sampling routine.”
- “A possible countermeasure is to randomize the order of execution of these vulnerable routines. Randomness should be used to shuffle the order of operations in Saber’s multiplication or introduce dummy operations.”
- “A possible countermeasure is to randomize the order of execution of these vulnerable routines. Randomness should be used to shuffle the order of operations in Saber’s multiplication or introduce dummy operations.”
summary
- Table 2 is an illustration for ℤ4
- “From this table, it can be seen that the linear operations, i.e. polynomial arithmetic, have roughly a factor 2x overhead in the masked design, due to the duplication of every polynomial multiplication. Non-linear operations, on the other hand, have overhead factors ranging from 7x for A2A conversion to 23x for binomial sampling. Our design requires 5048 random bytes, and spends roughly 100,000 cycles sampling these from the TRNG”
- Saber.Masked.PKE.Dec takes 1.96 times as long in the masked version
Saber.Masked.PKE.Enc takes 2.48 times as long in the masked version
typo
- Symmetric crypto primitive names are printed in math mode, making the kerning terrible.
- “Where Reparaz et al. successfully masked a Chosen-Plaintext Attack (CPA)-secure RLWE decryption, real-world applications typically require Chosen-Ciphertext Attack (CCA) secure primitives, which can be obtained using an appropriate CCA-transform.”
“Whereas Reparaz et al. successfully masked a Chosen-Plaintext Attack (CPA)-secure RLWE decryption, real-world applications typically require Chosen-Ciphertext Attack (CCA) secure primitives, which can be obtained using an appropriate CCA-transform.” - “A first-order masking splits any sensitive variable x in the algorithm into two shares x1 and x2, such that x = x1☉ x2, and perform all operations in the algorithm on the shares separately.”
“A first-order masking splits any sensitive variable x in the algorithm into two shares x1 and x2, such that x = x1☉ x2, and performs all operations in the algorithm on the shares separately.” - “In grey the operations that are influenced by the long term secret s and thus vulnerable to side-channel attacks.”
“In grey are operations that are influenced by the long term secret s and thus vulnerable to side-channel attacks.” - “Even though previous attacks have focused on schoolbook mulitplication …”
“Even though previous attacks have focused on schoolbook multiplication …”
A Sound Method for Switching between Boolean and Arith… §
Title: “A Sound Method for Switching between Boolean and Arithmetic Masking” by Louis Goubin [url] [dblp]
Published in 2001 at CHES 2001 and I read it in 2021-10
Abstract: Since the announcement of the Differential Power Analysis (DPA) by Paul Kocher and al., several countermeasures were proposed in order to protect software implementations of cryptographic algorithms. In an attempt to reduce the resulting memory and execution time overhead, a general method was recently proposed, consisting in “masking” all the intermediate data.
quotes
- “we present two new ‘BooleanToArithmetic’ and ‘ArithmeticToBoolean’ algorithms, proven secure against DPA attacks. Each of these algorithms uses only very simple operations: ‘XOR’, ‘AND’, subtractions and ‘logical shift left’. Our ‘Boolean-ToArithmetic’ algorithm uses a constant number (namely 7) of such elementary operations, whereas the number of elementary operations involved in our ‘ArithmeticToBoolean’ algorithm is proportional (namely equal to 5K + 5) to the size (i.e. the number K of bits) of the processor registers.”
- “In the present paper, we focus on the “masking method”, initially suggested by Goubin and Patarin in [10], and studied further in [11].”
- “In this paper, we solved the following open problem (stated in [6]):
find an efficient algorithm for converting from boolean masking to arithmetic masking and conversely, in which all intermediate variables are decorrelated from the data to be masked, so that it is secure against DPA”
summary
One of the most beautiful papers, I have read.
- It uses established mathematical definitions.
- All steps are well-documented and can be read linearly.
- It solves a generic problem.
- Boolean masking: x' = x ⊕ r
Arithmetic masking: A = x − r mod 2K
Minor issues:
- “Appendix N” actually means “Annex N”
- Where does Corollary 1.2 come from? It is not very explicit, thus I want to analyze it:
A = (x' ⊕ r) - r = Φx'(r) = Ψx'(γ) ⊕ Ψx'(r ⊕ γ) = [(x' ⊕ γ) ‑ γ] ⊕ x' ⊕ [(x' ⊕ (r ⊕ γ)) - (r ⊕ γ)] - For the Proof of Lemma 1, remember that AND is right-distributive over XOR
- In the Proof of Lemma 3, the twos-complement is assumed. Thus the entire algorithm only works under this assumption.
A system for typesetting mathematics §
Title: “A system for typesetting mathematics” by Brian W. Kernighan, Lorinda L. Cherry [url] [dblp]
Published in 1975 at and I read it in 2022-01
Abstract: The syntax of the language is specified by a small context-free grammar; a compiler-compiler is used to make a compiler that translates this language into typesetting commands. Output may be produced on either a phototypesetter or on a terminal with forward and reverse half-line motions. The system interfaces directly with text formatting programs, so mixtures of text and mathematics may be handled simply. This paper was typeset by the authors using the system described.
error
- “assume that that sqrt(a+b)”
“assume that sqrt(a+b)”
quotes
- “Mathematics is known in the trade as difficult, or penalty, copy because it is slower, more difficult, and more expensive to set in type than any other kind of copy normally occuring in books and journals”
- “On UNIX, the phototypesetter is driven by a formatting program called TROFF. TROFF was designed for setting running text.”
- “Thus the language should not assume, for instance, that parentheses are always balanced, for they are not in the half-open interval (a,b]. Nor should it assume that that sqrt(a+b) can be replaced by (a+b)^1/2, or that 1/(1-x) is better written as 1/(1-x) (or vice versa).”
- “A secondary, but still important, goal in our design was that the system should be easy to implement, since neither of the authors had any desire to make a long-term project of it.”
- “The standard mode of operation is that when a document is typed, mathematical expressions are input as part of the text, but marked by user settable delimiters.”
- “Input is free-form. Spaces and new lines in the input are used by EQN to separate pieces of the input; they are not used to create space in the output.”
- “Free-form input is easier to type initially; subsequent editing is also easier, for an expression may be typed as many short lines.”
- “Extra white space can be forced into the output by several characters of various sizes. A tilde ‘~’ gives a space equal to the normal word spacing in text; a circumflex ‘^’ gives half this much, and a tab character spaces to the next tab stop.”
- “Here spaces are necessary in the input to indicate that sin, pi, int, and omega are special, and potentially worth special treatment. EQN looks up each such string of characters in a table, and if appropriate gives it a translation.”
- “The spaces after the 2's are necessary to mark the end of the superscripts;”
- “Braces {} are used to group objects together; in this case they indicate unambiguously what goes over what on the left-hand side of the expression.”
- “Centering and making the ∑ big enough and the limits smaller are all automatic.”
- “There is a facility for making braces, brackets, parentheses, and vertical bars of the right height, using the keywords left and right:”
- “Thus we can say
lim~ roman "sup" ~x sub n = 0
to ensure that the supremum doesn't become a superscript:
lim sup xn = 0” - “That is, we assume x^a^b is x^(a^b), not (x^a)^b.”
- “One of our users commented that although the output is not as good as the best hand-set material, it is still better than average, and much better than the worst.”
summary
Design paper for a fundamental tool in the UNIX space is introduced. It is neat to see how the fundamentals of mathematical typesetting developed. It is truly an achievement that the paper is set using eqn & troff itself. However, the formal grammar behind eqn is not formalized rigorously, seems pretty complex (custom disambiguation rules).
- unlike Teχ, a different representation in display mode and inline mode is not considered
- On the one hand, the design implies a 1:1 mapping to its representation (due to the fact that secretaries are meant to be able to write the formulas by looking at them) and simultaneously rejects it in its syntax because “pi over 2” does not specify whether it should be \frac{\pi}{2}, \pi/2 or \pi÷2
- The paper says “nor should it assume that sqrt(a+b) can be replaced by (a+b)^{1/2}”
This seems universally true in all mathematical domains?! Apparently, it might mean that its representation cannot be changed because maybe x^{\frac12} stands for the 2nd component of the first vector of x (x is a sequence of vectors). - If multiple whitespaces collapse and white space can be introduced between semantic units at will, the syntax seems to be called “free form input”.
- Braces {} are used to disambiguate groups just like in the Teχ case
- An arrow to iterate towards a limit (x→0) is denoted “x -> 0”
- \left and \right is introduced. Instead of Teχ's “\right.”, the \right can just be dropped
- In a case distinction, you have a large brace and cases with similar structure (“if” keywords shall be aligned). This syntax now lists elements columnwise; unlike Teχ which lists elements rowwise
- diacritics are supported via special commands (e.g. “x dotdot” for ẍ)
- Indeed, in mathematics x^a^b usually means x^(a^b) because (x^a)^b could be written x^(a*b) in a simplified manner
- “Printed books cannot compete with the birds and flowers of illuminated manuscripts on esthetic grounds, either, but they have some clear economic advantages.”
I learned that this must refer to graphical elements (birds, flowers) common in illuminated manuscripts and that illuminated manuscripts take longer to produce than the eqn pipeline.
Additively Homomorphic Ring-LWE Masking §
Title: “Additively Homomorphic Ring-LWE Masking” by Oscar Reparaz, Ruan de Clercq, Sujoy Sinha Roy, Frederik Vercauteren, Ingrid Verbauwhede [url] [dblp]
Published in 2016 at PQCrypto 2016 and I read it in 2020-07
Abstract: In this paper, we present a new masking scheme for ringLWE decryption. Our scheme exploits the additively-homomorphic property of the existing ring-LWE encryption schemes and computes an additive-mask as an encryption of a random message. Our solution differs in several aspects from the recent masked ring-LWE implementation by Reparaz et al. presented at CHES 2015; most notably we do not require a masked decoder but work with a conventional, unmasked decoder. As such, we can secure a ring-LWE implementation using additive masking with minimal changes. Our masking scheme is also very generic in the sense that it can be applied to other additively-homomorphic encryption schemes.
quotes
The most important statement is equation 3:
decryption(c1, c2) ⊕ decryption(c1’, c2’) = decryption(c1 + c1’, c2 + c2’)
- “we do not require a masked decoder but work with a conventional, unmasked decoder.”
- “Masking [CJRR99,GP99] is a provable sound countermeasure against DPA.”
- “A caveat of our approach is that we need to place additional assumptions on the underlying arithmetic hardware compared to the CHES 2015 approach.”
- “The operation ⊕ is the xor operation on bits or strings of bits.”
- “In the literature there are several encryption schemes based on the ring-LWE problem, for example [LPR10,FV12,BLLN13] etc.”
- “Among all the computations, polynomial multiplication is the costliest. Most of the reported implementations use the Number Theoretic Transform (NTT) to accelerate the polynomial multiplications.”
- “The proposed randomized decryption. To perform the decryption of (c1, c2) in a randomized way, the implementation follows the following steps:
- Internally generate a random message m’ unknown to the adversary
- Encrypt m’ to (c1’, c2’)
- Perform decryption(c1 + c1’, c2 + c2’) to recover m ⊕ m’.
- The masked recovered message is the tuple (m’, m ⊕ m’).”
- “This approach has the nice property of not requiring a masked decoder.”
- “The obvious disadvantage is that extra circuitry or code is required to perform the encryption. Another disadvantage is the increased decryption failure rate. When two ciphertexts are added, the amount of noise increases. The added noise increases the decryption failure rate as we will see in Sect. 4.3.”
- “Our countermeasure can be thought of as ciphertext blinding.”
- “Thus, straightforward first-order DPA attack does not immediately apply. Nevertheless, more refined first-order DPA attacks do apply.”
- “In Appendix A we describe a strategy to detect whether s[i] = 0 or s[i] ≠ 0, which leads to an entropy loss.”
- “after all, Eq. 3 may seem to imply that the decoding function is linear. However, this is clearly not the case.”
- “When the masking is turned off, the decryption failure rate is 3.6 × 10-5 per bit. The failure rate increases to 3.3 × 10-3 per bit when the masking turned on.”
- “In terms of speed, the costliest process is the encryption. It is 2.8 times slower than the decryption.”
summary
A paper of rather low quality. The core essence is Equality 3: decryption(c1, c2) ⊕ decryption(c1’, c2’) = decryption(c1 + c1’, c2 + c2’). Besides that, there are many confusing statements. The workings of Equality 3 are barely mentioned (i.e. correctness of Equality 3 is IMHO insufficiently discussed), but should be understandable for everyone in the field. Correctness is non-obvious, because we have an addition on the RHS and XOR on the LHS. But it is trivial, because on the RHS, we consider ℤq whereas LHS uses ℤ2 and the XOR is addition in ℤ2. Unlike the CHES 2015 paper, no additional circuitry is needed, which makes it a super interesting approach. The failure rate increases by a factor of 92 (3.6 × 10-5 → 3.3 × 10-3 per bit), which is a difficult problem of its own, but given in the CHES 2015 approach as well.
In summary, the approach computes the encryption of the original message (⇒ (c1, c2)) and also the encryption of some random value (⇒ (c1’, c2’)). Then, we don't decrypt dec(c1, c2), but dec(c1 + c1’, c2 + c2’).
- “A caveat of our approach is that we need to place additional assumptions on the underlying arithmetic hardware compared to the CHES 2015 approach.”
- which ones? performance assumption on encryption?
- “Thus, straightforward first-order DPA attack does not immediately apply. Nevertheless, more refined first-order DPA attacks do apply.”
- “In particular, the practitioner should pay careful attention to leaking distances if implemented in software, since during the masked decoding both shares are handled in contiguous temporal locations.”
- undefined terminology “distances”
- “are handled in contiguous temporal locations” is not necessarily true (only likely true)
- “after all, Eq. 3 may seem to imply that the decoding function is linear. However, this is clearly not the case.”
- elaboration would be nice
- decryption failure is inherent to ring-LWE
- essentially, it is close to linearity
- but decryption is non-linear due to error
- and even neglecting the error as constant, the decryption failure is skewed here and thus linearity is not really given
- Figure 2 is incomprehensible and axes insufficiently explained
- Appendix A is vacuous. How can it be done? I guess the only statement is “it can be classified”, which is a trivial statement (considering template attacks) which does not need more justifications.
typo
- “The countermeasure makes harder the DPA attack” → “The countermeasure makes the DPA attack harder”
- “Note that the distribution of […] when s = 0 and […] is uniform random is different from the distribution of […] when s = 0.” → “Note that the distribution of […] when s = 0 and […] is uniform random but is different from the distribution of […] when s = 0.”
Aggregated Private Information Retrieval §
Title: “Aggregated Private Information Retrieval” by Lukas Helminger, Daniel Kales, Christian Rechberger [url] [dblp]
Published in 2020-05 at and I read it in 2020-07
Abstract: With the outbreak of the coronavirus, governments rely more and more on location data shared by European mobile network operators to monitor the advancements of the disease. In order to comply with often strict privacy requirements, this location data, however, has to be anonymized, limiting its usefulness for making statements about a filtered part of the population, like already infected people.
quotes
- “In this research, we aim to assist with the disease tracking efforts by designing a protocol to detect coronavirus hotspots from mobile data while still maintaining compliance with privacy expectations.”
- “Governments in Italy, Germany, and Austria are relying on this metadata to monitor how people are complying with stay-at-home orders.”
- “we design a specialized private information retrieval (PIR) protocol.”
- “In this paper, we are interested in the single-server variants, namely computational PIR (CPIR), which rely on cryptographic hardness assumptions to hide the query from the server. Recent work has heavily improved on the original ideas of Chor et al. Many CPIR implementations use homomorphic encryption (HE) to hide the queries from the server while still allowing him to perform operations on the
query.” - “HE is a cryptographic primitive that allows performing computations on encrypted data without knowing the secret decryption key.”
- “We assume without loss of generality that the first column of the database of the server consists of unique identifiers.”
- “The threat model of the APIR protocol is similar to PIR protocols, i.e., the server should not know which elements were retrieved by the client.”
- “To prevent the client from learning individual entries, we make sure that the client’s list of identifiers has a guaranteed minimum cardinality and that each identifier is unique.”
- “We report multithreaded runtimes of 30 minutes for the standard APIR protocol, and 1 hour when extra steps are applied to ensure the input vector is not malicious.”
- “The RSA encryption scheme is homomorphic for multiplication and Paillier’s cryptosystem is homomorphic for addition. However, it was not until Gentry’s groundbreaking work from 2009 that we were able to construct the first fully homomorphic encryption (FHE) scheme, a scheme which in theory can evaluate an arbitrary circuit on encrypted data. His construction is based on ideal lattices and is deemed to be too impractical ever to be used, but it led the way to construct more efficient schemes in many following publications”
- “Once this noise becomes too large and exceeds a threshold, the ciphertext cannot be decrypted correctly anymore. We call such a scheme a somewhat homomorphic encryption scheme (SHE), a scheme that allows evaluating an arbitrary circuit over encrypted data up to a certain depth.”
- “In his work, Gentry introduced the novel bootstrapping technique, a procedure that reduces the noise in a ciphertext and can turn a (bootstrappable) SHE scheme into an FHE scheme.”
- “In many practical applications it is, therefore, faster to omit bootstrapping and choose a SHE scheme with large enough parameters to evaluate the desired circuit.”
- “two different types of PIR: information theoretic PIR (IT-PIR) protocols which rely on multiple, non-colluding servers to ensure privacy; and
computational PIR (CPIR) where a single server manages the database and encryption is used hide the query.” - “An established technique to achieve ε-differential privacy is the Laplace mechanism, i.e., to add Laplacian noise to the final result of the computation.”
- “H should not learn the movement pattern of any individual. More concretely, H is not allowed to query the location data for less than W different people. W has to be chosen in such a way that the data aggregation provides anonymity and its exact value will highly depend on the actual underlying data, which is the reason why we do not give a generic value in this paper.”
- “First, M could try to find the place of residence by assuming people sleep at home.”
- […] “remove those places for the heat map creation.”
- “The computationally most expensive phase in the protocol is the Data Aggregation phase, in which the server multiplies a huge matrix to a homomorphically encrypted input vector.”
summary
The paper implements a nice idea. The usefulness of the usecase needs to be discussed with public health experts (what does it help if I know that many infected people live in this block of houses?). However, I have been told the entire paper was written in 1 month and that is quite impressive considering the technical depth in the field of Homomorphic Encryption.
There are many typos and to me, the main purpose of the protocol in Figure 1 was not comprehensible before talking to the authors. I also didn't understand in which ways ε-Differential Privacy is desirable or how it can be ensured and which definition they used for “vector c is binary” before going into details in section 3.2. Apparently, a binary vector is desirable to prevent leakage. For Figure 2, they used the Teχ package cryptocode to illustrate the protocol. To the best of my understanding, this is just a reiteration of Figure 2. On page 13, the paragraph “Note that, as we have already mentioned in Section 3.6” should be moved to the concluding remarks. On page 14, it is unclear, what “isolated profiles” are. I didn't go through the details of section 5.
BasicBlocker: Redesigning ISAs to Eliminate Speculativ… §
Title: “BasicBlocker: Redesigning ISAs to Eliminate Speculative-Execution Attacks” by Jan Philipp Thoma, Jakob Feldtkeller, Markus Krausz, Tim Güneysu, Daniel J. Bernstein [url] [dblp]
Published in 2020-07 at and I read it in 2020-08
Abstract: Recent research has revealed an ever-growing class of microarchitectural attacks that exploit speculative execution, a standard feature in modern processors. Proposed and deployed countermeasures involve a variety of compiler updates, firmware updates, and hardware updates. None of the deployed countermeasures have convincing security arguments, and many of them have already been broken.
Comment: Preprint
quotes
- “The obvious way to simplify the analysis of speculative-execution attacks is to eliminate speculative execution. This is normally dismissed as being unacceptably expensive, but the underlying cost analyses consider only software written for current instruction-set architectures, so they do not rule out the possibility of a new instruction-set architecture providing acceptable performance without speculative execution.”
- “The IBM Stretch computer in 1961 automatically speculated that a conditional branch would not be taken: it began executing instructions after the conditional branch, and rolled the instructions back if it turned out that the conditional branch was taken.”
- “Software analyses in the 1980s such as [12] reported that programs branched every 4–6 instructions.”
- “The penalty for mispredictions grew past 10 cycles. Meanwhile the average number of instructions per cycle grew past two, so the cost of each mispredicted branch was more than 20 instructions.”
- “The P-cycle branch-misprediction cost is the time from early in the pipeline, when instructions are fetched, to late in the pipeline, when a branch instruction computes the next program counter.”
- “A branch delay slot means that a branch takes effect only after the next instruction.”
- “The recent paper [49] introduces a RISC-V extension to flush microarchitectural state and shows that the extension stops several covert channels.”
- “Another approach to ISA modifications against transient-execution attacks is to explicitly tell the CPU which values are secret, and to limit the microarchitectural operations that can be carried out on secret values [35, 50, 51].”
- “The standard separation of fetch from decode also means that every instruction is being speculatively fetched.”
- “Program code can be divided into basic blocks. To do so, all possible control flow paths are mapped into a directed graph called Control Flow Graph (CFG) so that each edge of the CFG represents a control flow from one instruction to the next. If two vertices A and B are connected by a single edge (A → B) with A having only a single outbound edge and B having only a single inbound edge, the vertices are merged if the two instructions are sequential in memory.”
- “The microarchitectural state of a CPU is affected only by instructions that will eventually be retired.”
- “The most important implication of this goal is that the CPU must abandon any speculative behavior. This eliminates a major source of complexity inside the security analysis of modern CPUs.”
- “Within this basic block, the CPU is allowed to fast-fetch instructions, knowing that upcoming instructions can be found in a sequential order in memory and will definitely be executed. That is, since per definition, within the basic block, no control flow changes can occur. The instruction further provides information whether the basic block is sequential, stating that the control flow continues with the next basic block in the sequence in memory. If a basic block does not contain a control-flow instruction it is therefore sequential.”
- “We also modified the behavior of existing control-flow instructions, such as bne, j and jlre.”
- “If the current basic block does not contain a control-flow instruction, which is indicated by the sequential flag of the bb instruction, the CPU can fetch the next bb instruction directly.”
- “A processor supporting the bb instruction is required to have an instruction counter IC, a target register T , a branch flag B, and an exception flag E, all initialized to 0 on processor reset and used only as defined below.”
- “We can seamlessly support hardware loop counters in our design concept. One new instruction (lcnt) is necessary to store the number of loop iterations into a dedicated register.”
- “BasicBlocker can also be used as a form of coarse-grained CFI [1,2], as it only allows control flow changes to beginnings of basic blocks, indicated by a bb instruction. This reduces
the prospects of success for (JIT-)ROP [36, 37] attacks, as the variety of potential gadgets is reduced.” - “One could easily extend our design to larger, more complex CPUs.”
- “The more parallel pipelines a CPU has, the more important it gets to build software with large basic blocks, as small basic blocks with branch dependent instructions cause a higher performance loss on a multi-issue CPU compared to a single-issue CPU.”
- “It would also be possible to integrate our solution into a secure enclave by providing a modified fetch unit for the enclave.”
- “The bb instruction does not fit into any of the existing RISC-V instruction types so that we define a new instruction type to achieve an optimal utilization of the instruction bits (Figure 6).”
- “We used a configuration with five stages (IF, ID, EX, MEM, WB) and 4096 byte, one-way instruction- and data cache.”
- “Our compiler is based on the LLVM [28] Compiler Framework version 10.0.0, where we modified the RISC-V backend by introducing our ISA extension and inserting new compilation passes at the very end of the compilation pipeline to not interfere with other passes that do not support our new instructions.”
- “Linker relaxation, however, is one optimization that could reduce the number of instructions by substituting calls with a short jumping distance by a single jump instruction instead of two instructions (aupic and jalr).”
- “Highly optimized code, such as cryptographic libraries, are barely affected by branch prediction at all.”
- “Throughout our benchmarks, the average code size overhead was 17%.”
- “In some cases, BasicBlocker even outperforms sophisticated branch predictors.”
summary
Quite good paper.
“Preventing speculative execution exploits” conjured up more sophisticated expectations for my part, but in general the idea is legit and the research was done properly. Essentially, the additional bb instruction annotates how many following instructions do not contain control flow instructions which allows a processor to prefetch them without any speculative considerations. An additional lcnt instruction for RISC-V handles loops.
In the reading group it was criticized that formal definitions were cumbersome and inappropriate, but I consider it a strong suit that the idea was not only presented, but formalized. I think the negative aspects are that some statements are not properly attributed; like Observation 1 which is not a contribution by this paper, but prior work. Furthermore statements like “One could easily extend our design to larger, more complex CPUs” seem unjustified. Intel Skylake CPUs are built with 19 stage pipelines, which certainly shows different performance metrics. So more research into the relation of number of pipeline stages and performance is required. A new instruction type for RISC-V is also a non-trivial modification in practice. The corresponding code is not yet published, but the paper was just released 1 month prior reading. Another weak point is selling, which focuses on the good performance of BasicBlocker in the nettle-aes and nettle-sha benchmarks. As cryptographic applications, these obviously do not rely on branching except for “for loops”, which is completely non-representative, but statements like “In some cases, BasicBlocker even outperforms sophisticated branch predictors” can be found. Finally, Claudio pointed out very well, that JIT compilation cannot be implemented with BB since the number of instructions of a block are most likely not known.
Overall, a quite good paper, but less aggressive selling would have increased reputation from my perspective.
- A branch delay slot detects a lack of data dependency and thus moves the instruction after the jump before the jump. This is an older concept than out-of-order execution.
- Definitions (page 4):
- Definition 1: Microarchitectural Effects
- Definition 2: Retired Instructions
- Definition 3: Instruction Stream
- Definition 4: Control Flow Instructions
- Definition 5: Basic Block
- Definition 6: t-security
- Definitions (page 5 & 7):
- Definition 7: Hardware secure processor
- Definition 8: BB Instruction
- Definition 9: BB Delayed Branches
- Definition 10: BB Exceptions
- Definition 11: BB Required
- Definition 12: BB Prefetching
- “It would also be possible to integrate our solution into a secure enclave by providing a modified fetch unit for the enclave.”
- “configuration with five stages 4096 byte, one-way instruction- and data cache.”
- “Obviously, BasicBlocker slightly increases the code size as every basic block is extended by an additional instruction” with “average code size overhead was 17%”
- Gem5 project: “The gem5 simulator is a modular platform for computer-system architecture research, encompassing system-level architecture as well as processor microarchitecture. gem5 is a community led project with an open governance model”
- https://en.wikichip.org/wiki/WikiChip
Benchmarking Post-quantum Cryptography in TLS §
Title: “Benchmarking Post-quantum Cryptography in TLS” by Christian Paquin, Douglas Stebila, Goutam Tamvada [url] [dblp]
Published in 2020-04 at PQCRYPTO 2020 and I read it in 2021-06
Abstract: Post-quantum cryptographic primitives have a range of tradeoffs compared to traditional public key algorithms, either having slower computation or larger public keys and ciphertexts/signatures, or both. While the performance of these algorithms in isolation is easy to measure and has been a focus of optimization techniques, performance in realistic network conditions has been less studied. Google and Cloudflare have reported results from running experiments with post-quantum key exchange algorithms in the Transport Layer Security (TLS) protocol with real users’ network traffic. Such experiments are highly realistic, but cannot be replicated without access to Internet-scale infrastructure, and do not allow for isolating the effect of individual network characteristics.
quotes
- “In this work, we develop and make use of a framework for running such experiments in TLS cheaply by emulating network conditions using the networking features of the Linux kernel.”
- “Among our key results, we observe that packet loss rates above 3–5% start to have a significant impact on post-quantum algorithms that fragment across many packets, such as those based on unstructured lattices.”
- “We can see at least three major lines of work: (draft) specifications of how post-quantum algorithms could be integrated into existing protocol formats and message flows [9,17,33,34,37,41]; prototype implementations demonstrating such integrations can be done [6–8,15,19,20,30,31] and whether they would meet existing constraints in protocols and software [10]; and performance evaluations in either basic laboratory network settings [6,7] or more realistic network settings [8,15,19,21,22].”
- draft specifications:
- Campagna, M., Crockett, E.: Hybrid Post-Quantum Key Encapsulation Meth-
ods (PQ KEM) for Transport Layer Security 1.2 (TLS). - Kiefer, F., Kwiatkowski, K.: Hybrid ECDHE-SIDH key exchange for TLS.
- Schanck, J.M., Stebila, D.: A Transport Layer Security (TLS) extension for estab-
lishing an additional shared secret. - Schanck, J.M., Whyte, W., Zhang, Z.: Quantum-safe hybrid (QSH) ciphersuite
for Transport Layer Security (TLS) version 1.2. - Stebila, D., Fluhrer, S., Gueron, S.: Design issues for hybrid key exchange in TLS
1.3. - Whyte, W., Zhang, Z., Fluhrer, S., Garcia-Morchon, O.: Quantum-safe hybrid
(QSH) key exchange for Transport Layer Security (TLS) version 1.3.
- Campagna, M., Crockett, E.: Hybrid Post-Quantum Key Encapsulation Meth-
- prototype implementations:
- Bos, J.W., Costello, C., Ducas, L., Mironov, I., Naehrig, M., Nikolaenko, V.,
Raghunathan, A., Stebila, D.: Frodo: take off the ring! practical, quantum-secure
key exchange from LWE. - Bos, J.W., Costello, C., Naehrig, M., Stebila, D.: Post-quantum key exchange for
the TLS protocol from the ring learning with errors problem. - Braithwaite, M.: Experimenting with post-quantum cryptography.
- Kampanakis, P., Sikeridis, D.: Two post-quantum signature use-cases: Non-issues,
challenges and potential solutions. - Kwiatkowski, K., Langley, A., Sullivan, N., Levin, D., Mislove, A., Valenta, L.: Mea-
suring TLS key exchange with post-quantum KEM. - Langley, A.: CECPQ2.
- Open Quantum Safe Project. OQS-OpenSSL.
- Bos, J.W., Costello, C., Ducas, L., Mironov, I., Naehrig, M., Nikolaenko, V.,
- analysis:
- Crockett, E., Paquin, C., Stebila, D.: Prototyping post-quantum and hybrid key
exchange and authentication in TLS and SSH.
- Crockett, E., Paquin, C., Stebila, D.: Prototyping post-quantum and hybrid key
- evaluation
- Bos, J.W., Costello, C., Ducas, L., Mironov, I., Naehrig, M., Nikolaenko, V.,
Raghunathan, A., Stebila, D.: Frodo: take off the ring! practical, quantum-secure
key exchange from LWE. - Bos, J.W., Costello, C., Naehrig, M., Stebila, D.: Post-quantum key exchange for
the TLS protocol from the ring learning with errors problem. - Braithwaite, M.: Experimenting with post-quantum cryptography.
- Kampanakis, P., Sikeridis, D.: Two post-quantum signature use-cases: Non-issues, challenges and potential solutions.
- Kwiatkowski, K., Langley, A., Sullivan, N., Levin, D., Mislove, A., Valenta, L.: Mea-
suring TLS key exchange with post-quantum KEM. - Langley, A.: Post-quantum confidentiality for TLS.
- Langley, A.: Real-world measurements of structured-lattices and supersingu-
lar isogenies in TLS.
- Bos, J.W., Costello, C., Ducas, L., Mironov, I., Naehrig, M., Nikolaenko, V.,
- draft specifications:
- “Our framework is inspired by the NetMirage [40] and Mininet [23] network emulation software, and uses the Linux kernel’s networking stack to precisely and independently tune characteristics such as link latency and packet loss rate.”
- “Some of our key observations from the network emulation experiments measuring TLS handshake completion time are as follows. For the median connection, handshake completion time is significantly impacted by substantially slower algorithms (for example, supersingular isogenies (SIKE p434) has a significant performance floor compared to the faster structured and unstructured lattice algorithms), although this effect disappears at the 95th percentile. For algorithms with larger messages that result in fragmentation across multiple packets, performance degrades as packet loss rate increases: for example, median connection time for unstructured lattice key exchange (Frodo-640-AES) matches structured lattice performance at 5–10% packet loss, then begins to degrade; at the 95th percentile, this effect is less pronounced until around 15% packet loss. We see similar trends for post-quantum digital signatures, although with degraded performance for larger schemes starting around 3–5% packet loss since a TLS connection includes multiple public keys and signatures in certificates.”
- “Our experiments focused on post-quantum-only authentication, rather than hybrid authentication. We made this choice because, with respect to authenticating connection establishment, the argument for a hybrid mode is less clear: authentication only needs to be secure at the time a connection is established (rather than for the lifetime of the data as with confidentiality). Moreover, in TLS 1.3 there is no need for a server to have a hybrid certificate that can be used with both post-quantum-aware and non-post-quantum aware clients, as algorithm negotiation will be complete before the server needs to send its certificate.”
- Algorithm | Public key (bytes) | Ciphertext (bytes) | Key gen. (ms) | Encaps. (ms) | Decaps. (ms)
ECDH NIST P-256 | 64 | 64 | 0.072 | 0.072 | 0.072
SIKE p434 | 330 | 346 | 13.763 | 22.120 | 23.734
Kyber512-90s | 800 | 736 | 0.007 | 0.009 | 0.006
FrodoKEM-640-AES | 9,616 | 9,720 | 1.929 | 1.048 | 1.064 - “We used liboqs for the implementations of the post-quantum algorithms; liboqs takes its implementations directly from teams’ submissions to NIST or via the PQClean project [16].”
- “The goal of the emulated network experiments was to measure the time elapsed until completion of the TLS handshake under various network conditions. Following the procedure in Sect. 3, we created two network namespaces and connected them using a veth pair, one namespace representing a client, and the other a server. In the client namespace, we ran a modified version of OpenSSL’s”
- “s_time program, which measures TLS performance by making, in a given time period, as many synchronous (TCP) connections as it could to a remote host using TLS; our modified version (which we’ve called s timer), for a given number of repetitions, synchronously establishes a TLS connection using a given post-quantum algorithm, closes the connection as soon as the handshake is complete, and records only the time taken to complete the handshake. In the server namespace, we ran the nginx [28] web server, built against OQS-OpenSSL 1.1.1
so that it is post-quantum aware.” - “For context, telemetry collected by Mozilla on dropped packets in Firefox (nightly 71) in September and October 2019, indicate that, on desktop computers, packet loss rates above 5% are rare: for example, in the distribution of WEBRTC_AUDIO_QUALITY_OUTBOUND_PACKETLOSS_RATE, 67% of the 35.5 million samples collected had packet loss less than 0.1%, 89% had packet loss less than
1%, 95% had packet loss less than 4.3%, and 97% had packet loss less than
20%” - “Finally, for each combination of round-trip time and packet loss rate, and for each algorithm under test, 40 independent s timer “client” processes were run, each making repeated synchronous connections to 21 nginx worker processes, each of which was instructed to handle 1024 connections”
- “Given that these are data-centre-to-data-centre links, the packet loss on these links is practically zero.”
- “The Apache Benchmark (ab) tool [2] was installed on the client VM to measure connection time;”
- “At the median, over high quality network links (packet loss rates ≤ 1%), we observe that public key and ciphertext size have little impact on handshake completion time, and the predominant factor is cryptographic computation time:”
- “This is to be expected since the maximum transmission unit (MTU) of an ethernet connection is 1500 bytes whereas Frodo-640-AES public key and ciphertext sizes are 9616 bytes and 9720 bytes respectively, resulting in fragmentation across multiple packets.”
- “16 IP packets must be sent by the client to establish a TLS connection using ecdh-p256-frodo640aes.”
- “”
summary
Neat experiment. OpenQuantumSafe's project implementation is used to run a network experiment on Linux virtual network devices where packet loss and round trip times are the considered variables. The key finding is obvious, but the collected data is provided publicly enabling future research. It was very interesting to read Firefox telemetry's reference packet loss data.
- Equivalent actual reference data for data centers would be great
- Ethernet MTU is 1518 bytes (with 18 bytes metadata). Devices must be able to handle datagrams of size ≥576 bytes (IPv4) or ≥1280 bytes (IPv6)
Crouching error, hidden markup [Microsoft Word] §
Title: “Crouching error, hidden markup [Microsoft Word]” by N. Holmes [url] [dblp]
Published in 2001-09 at and I read it in 2021-08
Abstract:
quotes
- “That a reputable and widely read—if sardonic—column should end with the words ‘There’s
money to be made for old rope in IT’ reflects badly on the computing profession.” - “By all reports—such as Ted Lewis’s ‘Fast, Expensive, and Horribly Complex’ (Computer, Sept. 1999, pp. 120, 118-119)—the poor quality of software plays a salient role in the poor quality of service that computer users so frequently complain about.”
- “Why have those standards been ineffective in promoting software usability and serviceability? First, it may be that the standards we have are inappropriate. Second, even if appropriate standards do exist, it may be that the computing industry has yet to accept them.”
- “As a long-term user of formatting programs such as Script, Roff, and TeX, I felt apprehensive about using Word’s radically different approach, but as every working group had opted to produce its report using Word, I had no choice.”
- “The problem’s true nature became clear when I chanced upon ‘Lilac: A Two-View Document Editor’ by Kenneth Brooks (Computer, June 1991, pp. 7-19). Brooks described a system that
combines overt markup with WYSIWYG.” - “Markup conventions have a rich history. If you take a long-term view, markup conventions have been used in the data processing industry for thousands of years. Markup is conventional annotation designed to convey guidance to the user of plain text about the text’s intended treatment: This guidance originally applied to how the text should be read aloud and is otherwise known as punctuation.”
- “Writers rarely inserted spaces between words until Irish scribes in the late seventh century found it convenient to abandon the traditional scriptio continua.”
- “Programs such as Script and Roff proved quite useful on the relatively crude printers available at the time.”
- “Fortunately, the markup problem caught the attention of Donald Knuth. He studied the typographical tradition and brilliantly adapted the computer industry’s pathetic 7-bit character set to handle text markup capable of producing documents that would make a professional compositor proud. Knuth’s markup system, called TeX (http://www.tug.org), provides the kind
of typesetting system that would make a superb basis for a controllable WYSIWYG document editor. It’s versatile, extensible, thorough, and traditional. It adopts rather than ignores the printing
industry’s traditions. It’s also quite widely used by professionals such as mathematicians, who have their own particular problems with typography.” - “Curiously, at the heart of HTML’s markup style is a punctuation symbol called the diple (pronounced to rhyme with Ripley)—a quotation mark of scribes that the printing industry developed into several, all absent from both ASCII and EBCDIC.”
- “XML is to HTML what Unicode is to the ASCII character set. Just as Unicode ignored the writing system that ASCII so poorly served, and went on to curdle the world’s languages, so XML has ignored the document and gone on to curdle its bibliography.”
summary
In this article Holmes derives from his personal experiences the requirements for a modern text formatting system. He favors a markup language over the intransparent data model of WYSIWYG editors. He concludes that a dual system is the preferred user-centric way to go wheres Teχ is praised for its visual result.
I like his fundamental discussion of standards and former text formatting tools. His preference for Lilac seems understandable, but lacks depth. Whereas the concept was also applied to Macromedia's Dreamweaver on the HTML markup language, the situation becomes increasingly difficult with a larger data model and more powerful layouting scheme. CSS allows users to place objects on defined pixels which makes it difficult to select and edit elements in a visual editor if elements get close. One of the issues, the author completely ignores is Teχ's property as representational notation (in contrast to HTML). As such Teχ is less a markup language than a sequence of instructions to generate a document.
Cryptanalysis of ring-LWE based key exchange with key … §
Title: “Cryptanalysis of ring-LWE based key exchange with key share reuse” by Scott Fluhrer [url] [dblp]
Published in 2016 at and I read it in 2020-06
Abstract:
quotes
- “With Diffie-Hellman, it is perfectly safe to reuse the same public key share for multiple exchanges. One such use is the ‘ephemeral-static’ mode; in this case, Alice might select a private value, and publish the corresponding key share.”
- “The idea behind this protocol is that Alice computes V = SS' A + SE', while Bob computes V' = SSA' + S'E, they differ by SE' - S'E, as S, S', E, E' are small elements”
- “She deliberately selects S', E' so that coefficient 0 of Alice’s computation of US is near 0 (that is, near the boundary of quadrants I and IV; actually any of the quadrant boundaries could be used). For coefficient 0, Eve sets that error-reconcliation bit to indicate that the values in the range [0, p/2) are mapped to one bit value, while values in [p/2, p − 1] are mapped to the other.”
- “The first step in the attack for Eve is to find a lightweight value S' where (SS' A)[0] = ±1.”
- “The above can be done with perhaps 4,000 queries (assuming a ring size of N = 1024 and assuming that the value of S was generated using a discrete Gaussian Distribution with a standard deviation of circa 3).”
- “One place where this can potentially come up is in the current TLS 1.3 draft[6]. In this draft, they allow a server to declare a ’static keyshare’.”
- “The TLS 1.3 draft uses either DH or ECDH, which are both safe when used in this manner. However, if one were to replace the DH or ECDH with a ring-LWE based key exchange, this would become insecure.”
summary
typos
- “then it will be necessary for a fresh key share be generated for each exchange,” →
“then it will be necessary for a fresh key share to be be generated for each exchange,” - Alice selects ”small” elements →
Alice selects “small” elements - each element of V is ”close” to the corresponding →
each element of V is “close” to the corresponding - “Eve’s goal is to recover the value S the corresponds to Alice’s public key” →
“Eve’s goal is to recover the value S that corresponds to Alice’s public key” - “Each time after Alice and Eve has performed the key exchange protocol,” →
“Each time after Alice and Eve have performed the key exchange protocol,” - “Eve when then be able to generate one guess to Alice’s shared secret,” →
“Eve will then be able to generate one guess to Alice’s shared secret,” - “(where the notation F[i] specifies coefficient i-th of the ring element F),” →
“(where the notation F[i] specifies coefficient i of the ring element F),” - “She can do this by searching for values S' which consists of at most three coefficents are [1, −1] and the rest 0,” →
“She can do this by searching for values S' which consist of at most three coefficents that are [1, −1] and the rest 0,” →
Cryptographic competitions §
Title: “Cryptographic competitions” by Daniel J Bernstein [url] [dblp]
Published in 2020-12 at and I read it in 2021-06-20
Abstract: Competitions are widely viewed as the safest way to select cryptographic algorithms. This paper surveys procedures that have been used in cryptographic competitions, and analyzes the extent to which those procedures reduce security risks.
quotes
- “DES, the output of the first cryptographic competition, had an exploitable key size (see [47], [60], [113], [30], and [52]), had an exploitable block size (see [78] and [29]), and at the same time had enough denials of exploitability (see, e.g., [61], [46, Section 7], [63], and [1]) to delay the deployment of stronger ciphers for decades. As another example, AES performance on many platforms relies on table lookups with secret indices (“S-table” or “T-table” lookups), and these table lookups were claimed to be “not vulnerable to timing attacks” (see [45, Section 3.3] and [83, Section 3.6.2]), but this claim was incorrect (see [16] and [104]), and this failure continues to cause security problems today (see, e.g., [39]). As a third example, SHA-3 was forced to aim for a useless 2512 level of preimage security, and as a result is considerably larger and slower than necessary, producing performance complaints and slowing down deployment (see, e.g., [73])—which is a security failure if it means that applications instead use something weak (see, e.g., [76]) or nothing at all.”
- “If I set a speed record for some computation, am I doing it just for the thrill? Does the speed record actually matter for users? Making software run faster is a large part of my research; I want to believe that this is important, and I have an incentive to exaggerate its importance.”
- “Android had required storage encryption since 2015 except on ‘devices with poor AES performance (50 MiB/s and below)’.”
- “A recent paper ‘Post-quantum authentication in TLS 1.3: a performance study’ [99] states ‘Reports like [1] lead us to believe that hundreds of extra milliseconds per handshake are not acceptable’.”
- “Akamai’s underlying report [7] says, as one of its ‘key insights’, that ‘just a 100-millisecond delay in load time hurt conversion rates by up to 7%’.”
- “Could it be that what matters for sales isn’t actually the last bit of speed in delivering the web page, but rather the content of the web page?”
- “Perhaps there are other confounding factors, such as lower-income customers buying fewer products and also tending to have slower network connections.”
- “Even if hundreds of extra milliseconds per page load are unacceptable, it is an error to conflate this with hundreds of extra milliseconds per handshake. A TLS handshake sets up a session that can be, and often is, used to load many pages without further handshakes.”
- “All of the signature systems listed in [99, Table 1] have software available that signs in under 20 milliseconds on a 3GHz Intel Haswell CPU core from 2013 (never mind the possibility of parallelizing the computation across several cores).”
- “The total size of a public key and a signature in [99, Table 1] is at most 58208 bytes, so a 100Mbps network connection (already common today, never mind future trends) can transmit 4 public keys and 4 signatures in 20 milliseconds. For comparison, [62, “Total Kilobytes”] shows that the median web page has grown to 2MB, and that the average is even larger.”
- “Other major web servers in 2017 used initcwnd ranging from 10 through 46, according to [36].”
- “In the opposite direction, compared to a web server taking initcwnd as 46, a web server taking initcwnd as just 10 is sacrificing two round trips, almost half a second for a connection between the US and Singapore.”
- “[37] also said that Keccak ‘relies on completely different architectural principles from those of SHA-2 for its security’.”
- “Comparing the symmetric competitions shows trends towards larger and more complex inputs and outputs in the cryptographic algorithm interfaces. DES has a 64-bit block size; AES has a 128-bit block size. Stream ciphers encrypt longer messages. Hash functions hash longer messages. […] This does not mean that complexityis a goal per se: symmetric algorithms with larger interfaces often reach levels of efficiency that seem hard to achieve with smaller interfaces, and there are some security arguments for larger interfaces.”
- “During the AES competition, Biham [31, Table 3] reported that ‘the speed of the candidate ciphers on Pentium 133MHz MMX’ was
- 1254 cycles for Twofish,
- 1276 cycles for Rijndael,
- 1282 cycles for CRYPTON,
- 1436 cycles for RC6,
- 1600 cycles for MARS,
- 1800 cycles for Serpent,”
- “Biham also reported speeds scaled to “proposed minimal rounds”: 956 cycles for Serpent (17 rounds rather than 32), 1000 cycles for MARS (20 rounds rather than 32), 1021 cycles for Rijndael (8 rounds rather than 10),”
- “Why did these reports end up with such different numbers? And why did NIST’s AES efficiency testing [80] feature CRYPTON as the fastest candidate in its tables and its graphs, 669 Pentium Pro cycles to encrypt, with Rijndael needing 809 Pentium Pro cycles to encrypt?”
- “for example, the Rijndael encryption algorithm mapping a 128-bit plaintext and a 128-bit key to a 128-bit ciphertext.”
- “The benchmarking mechanism varies, for example in the handling of per input timing variations, initial code-cache-miss slowdowns, operating-system interrupts, clock-frequency variations, and cycle-counting overheads.”
- “The advertisement mechanism varies. As an example, measurements that are slower than previous work are likely to be suppressed if the advertisement mechanism is a paper claiming to set new speed records, but less likely to be suppressed if the advertisement mechanism is a paper claiming to compare multiple options.”
- “For example, the OpenSSL cryptographic library contains 26 AES implementations, almost all in assembly language, in a few cases weaving AES computations together with common hash-function computations. The library checks the target platform and selects an implementation accordingly.”
- “Compare [75], which estimates that AES produced $250 billion in worldwide economic benefits between 1996 and 2017.”
- “NIST […] in particular refused to list speeds of the fast Serpent implementations from [85] and [54], since those implementations had been constructed by ‘1000 hours of execution of search programs’ and ‘do not necessarily port to different platforms’.”
- “The costs of cryptographic optimization—in human time or computer time—can be huge obstacles for submitters without serious optimization experience.”
- “As part of the eSTREAM competition, Christophe De Cannière developed a new API for stream-cipher software, and wrote a new benchmarking toolkit [34] to measure implementations supporting the API. […] Notably, it tried more compiler options; it supported assembly-language software; and it was published.”
- “In 2008, eSTREAM was drawing to a close, and the SHA-3 competition had been announced. Lange and I started eBACS, a unified benchmarking project that includes eBASC for continued benchmarking of stream ciphers, eBASH for benchmarking of hash functions, and eBATS. We replaced BATMAN, the original benchmarking toolkit for eBATS, with a new benchmarking toolkit, SUPERCOP. By late 2009, eBASH had collected 180 implementations of 66 hash functions in 30 families. 7 eBASH became the primary source of software-performance information for the SHA-3 competition. See [37].”
- “The SUPERCOP API was carefully extended to handle more operations, such as authenticated encryption. CAESAR, NISTPQC, and NISTLWC required submissions to provide software using the SUPERCOP API. SUPERCOP now includes 3716 implementations of 1255 cryptographic functions in hundreds of families. See [27].”
- “Risk #1 of cryptography is that the cryptography isn’t used.”
- “How, then, is a competition for top speed not the same as a competition for minimum security?”
- “For example, say the efficiency metric is bit operations per bit of plaintext to encrypt a long stream; and say the minimum allowed security level is 2128. My understanding of current attacks is that Serpent reaches this security level with 12 rounds, using about 75 operations per bit; Rijndael reaches this security level with 8 rounds, using about 160 operations per bit; and Salsa20 reaches
this security level with 8 rounds, using 54 operations per bit. If these are the competitors then Salsa20 wins the speed competition. A user who can afford, say, 80 operations per bit then takes 12 rounds of Salsa20 (78 operations per bit). The same user would also be able to afford 12 rounds of Serpent, but 12 rounds of Salsa20 provide a larger security margin, presumably translating into a lower risk of attack.” - ”Serpent proposed more than twice as many rounds as necessary and had the whole proposal dismissed as being too slow.“
- “It was announced in 2012 that malware called “Flame” had been exploiting MD5 collisions since at least 2010; the analysis of [100] concluded that the Flame attackers had used an ‘entirely new and unknown’ variant of [101] (meaning new from the public perspective), that the Flame design ‘required world-class cryptanalysis’, and that it was ‘not unreasonable to assume’ that this cryptanalysis predated [101].”
- “As another example, the Rijndael designers and NIST claimed that Rijndael was ‘not vulnerable to timing attacks’, but this was incorrect, as noted in Section 1: AES was then broken by cache-timing attacks.”
- Fig.3.4: “Competitions:
- DES: the Data Encryption Standard (1974–1976)
- AES: the Advanced Encryption Standard (1998–2000)
- eSTREAM: the ECRYPT Stream Cipher Project (2005–2008)
- SHA-3: a Secure Hash Algorithm (2008–2012)
- CAESAR: Competition for Authenticated Encryption: Security, Applicability, and Robustness (2014–2019)
- NISTPQC: NIST Post-Quantum Cryptography Standardization Project (2017–?)
- NISTLWC: NIST Lightweight Cryptography Standardization Project (2019–?)”
- “CAESAR selection decisions will be made on the basis of published analyses. If submitters disagree with published analyses then they are expected to promptly and publicly respond to those analyses. Any attempt to privately lobby the selection-committee members is contrary to the principles of public evaluation and should be expected to lead to disqualification.”
- “I don’t know how to prove that factoring in expert judgments is more reliable than simply taking the fastest unbroken algorithm. Maybe it isn’t—or maybe there’s a better approach. It would be beneficial for the cryptographic community to put more effort into analyzing and optimizing risk-management techniques.”
- “Regarding accusations that IBM and NSA had ‘conspired’, Tuchman said ‘We developed the DES algorithm entirely within
IBM using IBMers. The NSA did not dictate a single wire!’” - “In 1979, NSA director Bobby Inman gave a public speech [63] including the following comments: ‘First, let me set the record straight on some recent history. NSA has been accused of intervening in the development of the DES and of tampering with the standard so as to weaken it cryptographically. This allegation is totally false.’”
- “However, an internal NSA history book ‘American cryptology during the cold war’ tells a story [66, pages 232–233] of much heavier NSA involvement in DES: […]”
- “‘NSA worked closely with IBM to strengthen the algorithm against all except brute force attacks and to strengthen substitution tables, called S-boxes. Conversely, NSA tried to convince IBM to reduce the length of the key from 64 to 48 bits. Ultimately, they compromised on a 56-bit key.’”
- “See, e.g., [13], describing NSA’s efforts between 2014 and 2018 to convince ISO to standardize Simon and Speck. One can only guess how many more algorithm-selection processes NSA was influencing through proxies in the meantime.”
- “Being more careful than whatever is required for a publication is taking time away from writing more papers, and as a community we want a sufficiently steady stream of broken cryptosystems as continued fuel for the fire.”
- “More broadly, performance seems to be the most powerful weapon we have in the fight against ideas for reducing security risks. Performance constantly drives us towards the edge of disaster, and that’s what we want.”
- “The challenge for thecommunity is to figure out whether we can maintain success in what we’re paid to do without a neverending series of security failures.”
summary
Very interesting meta-level read with many historical remarks. As organizers of cryptographic competitions, djb also has required background information to share. Performance as criterion is broadly discussed in the first half. The contextualization of running competitions is done nicely. Personal remarks can be found more towards the end. But it cannot offer solutions to the posed (difficult) problems.
Hitchhiker's guide to the paper:
- pages 3–6 goes into the details of (potentially false) advertising speed as important factor
- pages 7–8 discusses network-level aspects of speed
- pages 10–12 explains the difficult of proper benchmarking
- page 14 “Around that time the Internet was reported to be communicating roughly 258 bytes per year, more than doubling every year.” ⇒ citation would be nice
- pages 15–16 discuss benchmarking platforms
- page 17 discussion of the security margin
- page 19 contains a thought experiment
Uses Bayes Theorem: P(A | B) is shown where P(A) is the probability that the candidate gets broken and P(B) is the probability that the scheme is not broken within 36 months - page 21 / Fig 3.4 lists competitions systematically
- page 25–27 debates NSA involvement
- page 29 motivates community's role
Significant quotes:
- “Could it be that what matters for sales isn’t actually the last bit of speed in delivering the web page, but rather the content of the web page?”
- “Compare [75], which estimates that AES produced $250 billion in worldwide economic benefits between 1996 and 2017.”
- “The costs of cryptographic optimization—in human time or computer time—can be huge obstacles for submitters without serious optimization experience.”
- “SUPERCOP now includes 3716 implementations of 1255 cryptographic functions in hundreds of families”
- “Risk #1 of cryptography is that the cryptography isn’t used.”
- “How, then, is a competition for top speed not the same as a competition for minimum security?”
- “Being more careful than whatever is required for a publication is taking time away from writing more papers, and as a community we want a sufficiently steady stream of broken cryptosystems as continued fuel for the fire.”
- “More broadly, performance seems to be the most powerful weapon we have in the fight against ideas for reducing security risks. Performance constantly drives us towards the edge of disaster, and that’s what we want.”
- “The challenge for thecommunity is to figure out whether we can maintain success in what we’re paid to do without a neverending series of security failures.”
Cyclone: A safe dialect of C §
Title: “Cyclone: A safe dialect of C” by Greg Morrisett, James Cheney, Dan Grossman, Michael Hicks, Yanling Wang [url] [dblp]
Published in 2002 at USENIX 2002 and I read it in 2020-02
Abstract: Cyclone is a safe dialect of C. It has been designed from the ground up to prevent the buffer overflows, format string attacks, and memory management errors that are common in C programs, while retaining C’s syntax and semantics. This paper examines safety violations enabled by C’s design, and shows how Cyclone avoids them, without giving up C’s hallmark control over low-level details such as data representation and memory management.
Ambiguity
“Arrays and strings are converted to ?-pointers as necessary (automatically by the compiler).”
→ When exactly?
Example
“We don’t consider it an error if non-pointers are uninitialized. For example, if you declare a local array of non-pointers, you can use it without initializing the elements:
char buf[64]; // contains garbage ..
sprintf(buf,"a"); // .. but no err here
char c = buf[20]; // .. or even here
This is common in C code; since these array accesses are in-bounds, we allow them.”
Example
“However, this technique will not catch even the following simple variation:
char *itoa(int i) {
char buf[20];
char *z;
sprintf(buf,"%d",i);
z = buf;
return z;
}
Here, the address of buf is stored in the variable z, and then z is returned. This passes gcc -Wall without complaint.”
Quotes
- “Cyclone is a safe dialect of C. It has been designed from the ground up to prevent the buffer overflows, format string attacks, and memory management errors that are common in C programs, while retaining C’s syntax and semantics. This paper examines safety violations enabled by C’s design, and shows how Cyclone avoids them, without giving up C’s hallmark control over low-level details such as data representation and memory management.”
- “Every introductory C programming course warns against them and teaches techniques to avoid them, yet they continue to be announced in security bulletins every week. There are reasons for this that are more fundamental than poor training:
- One cause of buffer overflows in C is bad pointer arithmetic, and arithmetic is tricky. […]
- C uses NUL-terminated strings. […]
- Out-of-bounds pointers are commonplace in C. […]”
- “Our goal is to design Cyclone so that it has the safety guarantee of Java (no valid program can commit a safety violation) while keeping C’s syntax, types, semantics, and idioms intact.”
- “We must reject some safe programs,because it is impossible to implement an analysis that perfectly separates the safe programs from the unsafe programs.”
- Table 1: “Restrictions imposed by Cyclone to preserve safety”
- NULL checks are inserted to prevent segmentation faults
- Pointer arithmetic is restricted
- Pointers must be initialized before use
- Dangling pointers are prevented through region analysis and limitations on free
- Only “safe” casts and unions are allowed
- goto into scopes is disallowed
- switch labels in different scopes are disallowed
- Pointer-returning functions must execute return
- setjmp and longjmp are not supported
- Table 2: “Extensions provided by Cyclone to safely regain C programming idioms”
- Never-NULL pointers do not require NULL checks
- “Fat” pointers support pointer arithmetic with run-time bounds checking
- Growable regions support a form of safe manual memory management
- Tagged unions support type-varying arguments
- Injections help automate the use of tagged unions for programmers
- Polymorphism replaces some uses of void *
- Varargs are implemented with fat pointers
- Exceptions replace some uses of setjmp and longjmp
- “If you call getc(NULL), what happens? The C standard gives no definitive answer.”
- “Cyclone’s region analysis is intraprocedural — it is not a whole-program analysis.”
- “Here ‘r is a region variable.” (c.f. rust's notation)
- “Obviously, programmers still need a way to reclaim heap-allocated data. We provide two ways. First, the programmer can use an optional garbage collector. This is very helpful in getting existing C programs to port to Cyclone without many changes. However, in many cases it constitutes an unacceptable loss of control.”
- “A goto that does not enter a scope is safe, and is allowed in Cyclone. We apply the same analysis to switch statements, which suffer from a similar vulnerability in C.”
- “The Cyclone compiler is implemented in approximately 35,000 lines of Cyclone. It consists of a parser, a static analysis phase, and a simple translator to C. We use gcc as a back end and have also experimented with using Microsoft Visual C++.”
- “When a user compiles with garbage collection enabled, we use the Boehm-Demers-Weiser conservative garbage collector as an off-the-shelf component.”
- “We achieve near-zero overhead for I/O bound applications such as the web server and the http programs, but there is a considerable overhead for computationally-intensive benchmarks;”
- “Cyclone’s representation of fat pointers turned out to be another important overhead. We represent fat pointers with three words: the base address, the bounds address, and the current pointer location (essentially the same representation used by Mc-Gary’s bounded pointers [26]).”
- “Good code generation can make a big difference: we found that using gcc’s -march=i686 flag increased the speed of programs making heavy use of fat pointers (such as cfrac and grobner) by as much as a factor of two, because it causes gcc to use a more efficient implementation of block copy.”
- “We found array bounds violations in three benchmarks when we ported them from C to Cyclone: mini_httpd, grobner, and tile. This was a surprise, since at least one (grobner) dates back to the mid 1980s.”
- “Cyclone began as an offshoot of the Typed Assembly Language (TAL) project”
- “In C, a switch case by default falls through to the next case, unless there is an explicit break. This is exactly the opposite of what it should be: most cases do not fall through, and, moreover, when a case does fall through, it is probably a bug. Therefore, we added an explicit fallthru statement,” […] “Our decision to “correct” C’s mistake was wrong. It made porting error-prone because we had to examine every switch statement to look for intentional fall throughs, and add a fallthru statement.”
- “There is an enormous body of research on making C safer. Most techniques can be grouped into one of the following strategies:”
- Static analysis. […]
- Inserting run-time checks. […]
- Combining static analysis and run-time checks. […]
Region blocks
“Therefore, Cyclone provides a feature called growable regions. The following code declares a growable region, does some allocation into the region, and deallocates theregion:
region h {
int *x = rmalloc(h,sizeof(int));
int ?y = rnew(h) { 1, 2, 3 };
char ?z = rprintf(h,"hello");
}
The code uses a region block to start a new, growable region that lives on the heap. The region is deallocated on exit from the block (without an explicit free).”
Summary
Great paper with pragmatic approach derived from the Typed Assembly Language project. Definitely worth a read. Essential for everyone interested in the Rust programming language as this project inspired many ideas related to proper memory management and lifetimes. Implementation needs to be compiled on your own, but it is not maintained anymore anyways. Furthermore, there are more papers following the progress of the project and they also introduced more drastic changes which discards the label “pragmatic”.
Tagged unions
“We solve this in Cyclone in two steps. First, we add tagged unions to the language:”
tunion t {
Int(int);
Str(char ?);
};
[…]
void pr(tunion t x) {
switch (x) {
case &Int(i): printf("%d",i); break;
case &Str(s): printf("%s",s); break;
}
}
“The printf function itself accesses the tagged arguments through a fat pointer (Cyclone’s varargs are bounds checked)”
Detecting Unsafe Raw Pointer Dereferencing Behavior in… §
Title: “Detecting Unsafe Raw Pointer Dereferencing Behavior in Rust” by Zhijian Huang, Yong Jun Wang, Jing Liu [url] [dblp]
Published in 2018 at IEICE 2018 and I read it in 2021-11
Abstract: The rising systems programming language Rust is fast, efficient and memory safe. However, improperly dereferencing raw pointers in Rust causes new safety problems. In this paper, we present a detailed analysis into these problems and propose a practical hybrid approach to detecting unsafe raw pointer dereferencing behaviors. Our approach employs pattern matching to identify functions that can be used to generate illegal multiple mutable references (We define them as thief function) and instruments the dereferencing operation in order to perform dynamic checking at runtime. We implement a tool named UnsafeFencer and has successfully identified 52 thief functions in 28 real-world crates∗, of which 13 public functions are verified to generate multiple mutable references.
quotes
- “Our approach employs pattern matching to identify functions that can be used to generate illegal multiple mutable references (We define them as thief function) and instruments the dereferencing operation in order to perform dynamic checking at runtime.”
- “Thus, unsafe Rust feature is enabled to allow programmers perform dangerous operations: dereferencing a raw pointer, calling an unsafe function or method, accessing or modifying a mutable static variable and implementing an unsafe trait [2].”
- “There are two types of raw pointers in Rust: immutable raw pointer (*const T) and mutable raw pointer (*mut T).”
- “Values in Rust are scoped and bound to a unique owner (variable).”
- “Nicholas [6] introduced a scenario in which freed Box value can be accessed.”
- “We name these functions as thief functions and design a thief function pattern to model them. The pattern is made up of the following conditions:
- The return value of the function is a mutable reference or data containing mutable references as member fields.
- The input arguments of the function contain no mutable references.
- The function is not declared with unsafe.”
- “We implement the approach as a tool UnsafeFencer on Rust compiler plugin. UnsafeFencer is made up of two components: finder and fencer. The finder transverses the Abstract Syntax Tree (AST) to find thief functions matching the pattern.”
- “The code is available at https://github.com/qorost/unsafefencer.”
- “”
summary
A nice specific LLVM compiler plugin to detect a special case where multiple mutable references in the context of unsafe are generated. Its practicality is limited, but the authors provided pragmatic tools to make it as accessible as possible.
typos/problems
- Box<unsize> should be Box<usize>
- Caption Fig. 1: Examples
- Fig .2: Shdowi32 should be Shadowi32
- “We downloaded all the available crate (as of May. 2017)” →
“We downloaded all the available crates (as of May. 2017)” - “Due to the inappropriate configuration of running environment, 5,703 crates were able to run the experiments successfully.” is an incomprehensible sentence to me
- In Figure 4, the text is too small
EWD1300: The Notational Conventions I Adopted, and Why §
Title: “EWD1300: The Notational Conventions I Adopted, and Why” by Edsger W. Dijkstra [url] [dblp]
Published in 2002 at and I read it in 2021-09
Abstract:
quotes
- “Without much hesitation I have decided to stick to the usual infix operators.”
- “It made me realize why I like it so much for associative operators: it allows us to write p + q + r without being forced to choose between (p + q) + r and p + (q + r): in prefix notation, the choice between + + pqr and + p + qr would have been unavoidable.”
- “Only use priority rules that are frequently appealed to, and hence are familiar.”
- “Do not introduce priority rules that destroy symmetry.”
- “When you have the freedom, choose the larger symbol for the operator with the lower binding power.”
- “Surround the operators with the lower binding power with more space than those with a higher binding power.”
- “For unary operators, give them the highest binding power and stick to prefix operators.”
- “I learned to appreciate expressions built with operators as a way of avoiding functional notation.”
- “A. N. Whitehead made many wise remarks it is a pleasure to agree with, but I cannot share his judgement when he applauds the introduction of the invisible multiplication sign. The multiplication being so common, he praises the mathematical community for the efficiency of its convention, but he ignores the price. The one price is confusion: look at the different semantics of the juxtapositions in {3 1/2, 3y, 32}. Is it a wonder that little children (many of whom have a most systematic mind) get confused by the mathematics they are taught?”
- “Brevity is much more effectively obtained by macroscopic measures such as avoiding duplication, case analysis and superfluous nomenclature, than by such microscopic measures as making the multiplication sign invisible.”
- “So, instead of the traditional f(x) I now write f.x.”
- “The convention that function application is left-associative has been adopted almost universally.”
- “[…] postulating that functional composition has a greater binding power than functional application.”
- “<i: i < 10: i²>
The explicit enumeration of the dummies between < and the first : acts like ALGOL 60’s
declaration of the local variables of an inner block.” - “We therefore present the calculation in the following format
A
→ {hint why A→B}
B
→ {hint why B→C}
C” - “But personally I am in favour of making the hints as clear and helpful as possible.”
- “For the formula number’s place the left margin is to be preferred over the right margin, because then it is easier to maintain a small distance between the number and the formula it numbers. (A similar remark applies to page numbers in tables of contents.)”
- “In addition I have chosen to use a pair of square brackets to denote for a boolean function universal quantification over its domain: [b] ≡ ∀t :: b . t .”
summary
A nice read summing up opinions related to mathematical notation. Prefix/Infix notation, precedence rules, and the layout of proofs are discussed. Details are often lacking, but I expected it to be only some opinion paper.
Engineering a sort function §
Title: “Engineering a sort function” by Jon L. Bentley, M. Douglas McIlroy [url] [dblp]
Published in 1993 at Software - Practice and Experience and I read it in 2022-12-29
Abstract: We recount the history of a new qsort function for a C library. Our function is clearer, faster and more robust than existing sorts. It chooses partitioning elements by a new sampling scheme; it partitions by a novel solution to Dijkstra’s Dutch National Flag problem; and it swaps efficiently. Its behavior was assessed with timing and debugging testbeds, and with a program to certify performance. The design techniques apply in domains beyond sorting.
quotes
- “C libraries have long included a qsort function to sort an array, usually implemented by Hoare’s Quicksort.1 Because existing qsorts are flawed, we built a new one. This paper summarizes its evolution.” (Bentley and McIlroy, 1993, p. 1249)
- “Shopping around for a better qsort, we found that a qsort written at Berkeley in 1983 would consume quadratic time on arrays that contain a few elements repeated many times—in particular arrays of random zeros and ones.” (Bentley and McIlroy, 1993, p. 1249)
- “The sort need not be stable; its specification does not promise to preserve the order of equal elements.” (Bentley and McIlroy, 1993, p. 1250)
- “Sedgewick studied Quicksort in his Ph.D. thesis” (Bentley and McIlroy, 1993, p. 1251)
- “A more efficient (and more familiar) partitioning method uses two indexes i and j. Index i scans up from the bottom of the array until it reaches a large element (greater than or equal to the partition value), and j scans down until it reaches a small element. The two array elements are then swapped, and the scans continue until the pointers cross. This algorithm is easy to describe, and also easy to get wrong—Knuth tells horror stories about inefficient partitioning algorithms.” (Bentley and McIlroy, 1993, p. 1252)
- “As a benchmark, swapping two integers in inline code takes just under a microsecond.” (Bentley and McIlroy, 1993, p. 1253)
- “using inline swaps for integer-sized objects and a function call otherwise.” (Bentley and McIlroy, 1993, p. 1254)
- “When a colleague and I modified sort to improve reliability and efficiency, we found that techniques that improved performance for other sorting applications sometimes degraded the performance of sort.’” (Bentley and McIlroy, 1993, p. 1254)
- “Partitioning about a random element takes Cn ∼∼ 1.386n lg n comparisons. We now whittle away at the constant in the formula. If we were lucky enough to choose the median of every subarray as the partitioning element, we could reduce the number of comparisons to about n lg n.” (Bentley and McIlroy, 1993, p. 1254)
- “We adopted Tukey’s ‘ninther’, the median of the medians of three samples, each of three elements.” (Bentley and McIlroy, 1993, p. 1255)
- “Tripartite partitioning is equivalent to Dijkstra’s ‘Dutch National Flag’ problem.” (Bentley and McIlroy, 1993, p. 1257)
- “Quicksort with split-end partitioning (Program 7) is about twice as fast as the Seventh Edition qsort.” (Bentley and McIlroy, 1993, p. 1258)
- “Since the expected stack size is logarithmic in n, the stack is likely to be negligible compared to the data—only about 2,000 bytes when n = 1,000,000.” (Bentley and McIlroy, 1993, p. 1259)
- “We therefore emulated Knuth’s approach to testing TeX: ‘I get into the meanest, nastiest frame of mind that I can manage, and I write the nastiest code I can think of; then I turn around and embed that in even nastier constructions that are almost obscene.’” (Bentley and McIlroy, 1993, p. 1260)
- “P. McIlroy’s merge sort has guaranteed O (n log n) worst-case performance and is almost optimally adaptive to data with residual order (it runs the highly nonrandom certification suite of Figure 1 almost twice as fast as Program 7), but requires O (n) additional memory.” (Bentley and McIlroy, 1993, p. 1262)
- “The key to performance is elegance, not battalions of special cases.” (Bentley and McIlroy, 1993, p. 1263)
summary
In this paper, the authors try to improve the performance of qsort; a Quicksort implementation by Lee McMahon based on Scowen’s ‘Quickersort’ shipped with the Seventh Edition Unix System. Predictably, Quicksort exhibits quadratic behavior in an easy-to-discover set of inputs. As a result, the authors look at some other proposals and use techniques like split-end partitioning to improve upon these set of inputs.
An old, but down-to-earth paper. It is interesting to see how asymptotic behavior was really the main driver for considerations during that time (contrary to considerations regarding caches like today). But I want to emphasize that actual benchmarks were also provided in the paper.
typo
“to sift the” (Bentley and McIlroy, 1993, p. 1250)
Everything Old is New Again: Binary Security of WebAss… §
Title: “Everything Old is New Again: Binary Security of WebAssembly” by Daniel Lehmann, Johannes Kinder, Michael Pradel [url] [dblp]
Published in 2020 at USENIX Security 2020 and I read it in 2020-07
Abstract: WebAssembly is an increasingly popular compilation target designed to run code in browsers and on other platforms safely and securely, by strictly separating code and data, enforcing types, and limiting indirect control flow. Still, vulnerabilities in memory-unsafe source languages can translate to vulnerabilities in WebAssembly binaries. In this paper, we analyze to what extent vulnerabilities are exploitable in WebAssembly binaries, and how this compares to native code. We find that many classic vulnerabilities which, due to common mitigations, are no longer exploitable in native binaries, are completely exposed in WebAssembly. Moreover, WebAssembly enables unique attacks, such as overwriting supposedly constant data or manipulating the heap using a stack overflow. We present a set of attack primitives that enable an attacker (i) to write arbitrary memory, (ii) to overwrite sensitive data, and (iii) to trigger unexpected behavior by diverting control flow or manipulating the host environment. We provide a set of vulnerable proof-of-concept applications along with complete end-to-end exploits, which cover three WebAssembly platforms. An empirical risk assessment on real-world binaries and SPEC CPU programs compiled to WebAssembly shows that our attack primitives are likely to be feasible in practice. Overall, our findings show a perhaps surprising lack of binary security in WebAssembly. We discuss potential protection mechanisms to mitigate the resulting risks.
quotes
- “We find that many classic vulnerabilities which, due to common mitigations, are no longer exploitable in native binaries, are completely exposed in WebAssembly. Moreover, WebAssembly enables unique attacks, such as overwriting supposedly constant data or manipulating the heap using a stack overflow.”
- “WebAssembly is an increasingly popular bytecode language that offers a compact and portable representation, fast execution, and a low-level memory model. Announced in 2015 and implemented by all major browsers in 2017, WebAssembly is supported by 92% of all global browser installations as of June 2020. The language is designed as a compilation target, and several widely used compilers exist, e.g., Emscripten for C and C++, or the Rust compiler, both based on LLVM”
- “There are two main aspects to the security of the WebAs-sembly ecosystem: (i) host security, the effectiveness of the runtime environment in protecting the host system against malicious WebAssembly code; and (ii) binary security, the effectiveness of the built-in fault isolation mechanisms in preventing exploitation of otherwise benign WebAssembly code”
- “Comparing the exploitability of WebAssembly binaries with native binaries, e.g., on x86, shows that WebAssembly re-enables several formerly defeated attacks because it lacks modern mitigations. One example are stack-based buffer overflows, which are effective again because WebAssembly binaries do not deploy stack canaries.”
- “The original WebAssembly paper addresses this question briefly by saying that “at worst, a buggy or exploited WebAssembly program can make a mess of the data in its own memory”
- “Regarding data-based attacks, we find that one third of all functions make use of the unmanaged (and unprotected) stack in linear memory. Regarding control-flow attacks, we find that every second function can be reached from indirect calls that take their target directly from linear memory. We also compare WebAssembly’s type-checking of indirect calls with native control-flow integrity defenses.”
- “There are four primitive types: 32 and 64 bit integers (i32 , i64) and single and double precision floats (f32 , f64). More complex types, such as arrays, records, or designated pointers do not exist.”
- “Branches can only jump to the end of surrounding blocks, and only inside the current function. Multi-way branches can only target
blocks that are statically designated in a branch table. Unrestricted gotos or jumps to arbitrary addresses are not possible. In particular, one cannot execute data in memory as bytecode instructions.” - “The call_indirect instruction on the left pops a value from the stack, which it uses to index into the so called table section. Table entries map this index to a function, which is subsequently called. Thus, a function can only be indirectly called if it is present in the table.”
- “In contrast to other byte-code languages, WebAssembly does not provide managed memory or garbage collection. Instead, the so called linear memory is simply a single, global array of bytes.”
- “One of the most basic protection mechanisms in native programs is virtual memory with unmapped pages. A read or write to an unmapped page triggers a page fault and terminates the program, hence an attacker must avoid writing to such addresses. WebAssembly’s linear memory, on the other hand, is a single, contiguous memory space without any holes, so every pointer ∈ [0, max_mem] is valid. […] This is a fundamental limitation of linear memory with severe consequences. Since one cannot install guard pages between static data, the unmanaged stack, and the heap, overflows in one section can silently corrupt data in adjacent sections.”
- “In WebAssembly, linear memory is non-executable by design, as it cannot be jumped to.”
- “an overflow while writing into a local variable on the unmanaged stack, e.g., buffer, may overwrite other local variables in the same and even in other stack frames upwards in the stack”
- “Because in WebAssembly no default allocator is provided by the host environment, compilers include a memory allocator as part of the compiled program”
- “While standard allocators, such as dlmalloc, have been hardened against a variety of metadata corruption attacks, simplified and lightweight allocators are often vulnerable to classic attacks. We find both emmalloc and wee_alloc to be vulnerable to metadata corruption attacks, which we illustrate for a version of emmalloc in the following.”
- “As WebAssembly has no way of making data immutable in linear memory, an arbitrary write primitive can change the value of any non-scalar constant in the program, including, e.g., all string literals.”
- “Version 1.6.35 of libpng suffers from a known buffer overflow vulnerability (CVE-2018-14550 [3]), which can be exploited when converting a PNM file to a PNG file. When the library is compiled to native code with modern compilers on standard settings, stack canaries prevent this vulnerability from being exploited. In WebAssembly, the vulnerability can be exploited unhindered by any mitigations.”
- “While exec and the log_* functions have different C++ types, all three functions have identical types on the WebAssembly level (Figure 8b). The reason is that both integers and pointers are represented as i32 types in WebAssembly, i.e., the redirected call passes WebAssembly’s type check.”
- “To the best of our knowledge, it is the first security analysis tool for WebAssembly binaries. The analysis is written in Rust”
- “For example, with ten nested calls (assuming a uniform distribution of functions), there would be some data on the unmanaged stack with 1 − ((1 − 0.33) 10 ) ≈ 98.2% probability.”
- “Averaged over all 26 programs, 9.8% of all call instructions are indirect,”
- “[…] how many functions are type-compatible with at least one call_indirect instruction and present in the table section. The percentage of indirectly callable functions ranges from 5% to 77.3%, with on average 49.2% of all functions in the program corpus.”
- “WebAssembly’s type checking of indirect calls can be seen as a form of control-flow integrity (CFI) for forward edges. Backward
edges, i.e., returns, are protected due to being managed by the VM and offer security conceptually similar to shadow stack solutions.” - multiple memories proposal: Andreas Rossberg. “Multiple per-module memories for Wasm”. https://github.com/WebAssembly/multi-memory, 2019.
- reference types proposal: Andreas Rossberg. “Proposal for adding basic reference types”. https://github.com/WebAssembly/reference-types, 2019.
- “Examples that would benefit Web-Assembly compilers are FORTIFY_SOURCE-like code rewriting, stack canaries, CFI defenses, and safe unlinking in memory allocators. In particular for stack canaries and rewriting commonly exploited C string functions, we believe there are no principled hindrances to deployment.”
- “Developers of WebAssembly applications can reduce the risk by using as little code in “unsafe” languages, such as C, as possible.”
- “The language has a formally defined type system shown to be sound”
- via [37] = “Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code”: “We find that the mean slowdown of WebAssembly vs. native across SPEC bench-marks is 1.55×for Chrome and 1.45×for Firefox, with peak slowdowns of 2.5×in Chrome and 2.08×in Firefox”
summary
Wonderful paper. Firstly, a security analysis was necessary and I think they covered an appropriate amount of attack vectors. Secondly it is, of course, sad to see that many old vulnerabilities still work on a new platform.
I am unconditionally in favor of true read-only memory in WebAssembly. As David points out, this can also be enforced by a compiler by appropriate runtime checks. However, David also specified a criterion: Iff a memory attack could lead to an escape from the WASM sandbox, then it is an issue of WebAssembly sandbox and should be prevented at this level.
I keep wondering about stack canaries and guard pages. Maybe it was a decision between the trade-off of security and performance. But I am not convinced by it with 100%. Thirdly, the paper is well-structured and gives sufficient data to reason its arguments. The attacks were okay, the introduction to WebAssembly was superb and justifying the claims regarding indirect calls with quantitative data in section 6 was outstanding. I think everyone in IT security can easily follow it. I love it!
WebAssembly architecture:
- not possible
- overwriting string literals in supposedly constant memory is not possible
- In WebAssembly, linear memory is non-executable by design, as it cannot be jumped to
- WebAssembly’s linear memory is a single, contiguous memory space without any holes, so every pointer ∈ [0, max_mem] is valid.
- measures, restrictions
- program memory, data structures of underlying VM and stack are separated
- binaries are easily type-checked
- only jump to designated code locations
- WebAssembly has two mechanisms that limit an attacker’s ability to redirect indirect calls. First, not all functions defined in or exported into a WebAssembly binary appear in the table for indirect calls, but only those that may be subject to an indirect call. Second, all calls, both direct and indirect, are type checked.
- missing
- In WebAssembly, there is no ASLR.
- does not use stack canaries
- buffer and stack overflows are thus very powerful attack primitives in WebAssembly
- an overflow while writing into a local variable on the unmanaged stack, e.g., buffer, may overwrite other local variables in the same and even in other stack frames upwards in the stack,
- As WebAssembly has no way of making data immutable in linear memory, an arbitrary write primitive can change the value of any non-scalar constant in the program, including, e.g., all string literals.
toc
• Introduction
• Background on WebAssembly
• Security Analysis of Linear Memory
• Managed vs. Unmanaged Data
• Memory Layout
• Memory Protections
• Attack Primitives
• Obtaining a Write Primitive
• Stack-based Buffer Overflow
• Stack Overflow
• Heap Metadata Corruption
• Overwriting Data
• Overwriting Stack Data
• Overwriting Heap Data
• Overwriting “Constant” Data
• Triggering Unexpected Behavior
• Redirecting Indirect Calls
• Code Injection into Host Environment
• Application-specific Data Overwrite
• End-to-End Attacks
• Cross-Site Scripting in Browsers
• Remote Code Execution in Node.js
• Arbitrary File Write in Stand-alone VM
• Quantitative Evaluation
• Experimental Setup and Analysis Process
• Measuring Unmanaged Stack Usage
• Measuring Indirect Calls and Targets
• Comparing with Existing CFI Policies
• Discussion of Mitigations
• WebAssembly Language
• Compilers and Tooling
• Application and Library Developers
• Related Work
• Conclusion
FastSpec: Scalable Generation and Detection of Spectre… §
Title: “FastSpec: Scalable Generation and Detection of Spectre Gadgets Using Neural Embeddings” by M. Caner Tol, Koray Yurtseven, Berk Gulmezoglu, Berk Sunar [url] [dblp]
Published in 2020 at and I read it in 2020-07
Abstract: Several techniques have been proposed to detect vulnerable Spectre gadgets in widely deployed commercial software. Unfortunately, detection techniques proposed so far rely on hand-written rules which fall short in covering subtle variations of known Spectre gadgets as well as demand a huge amount of time to analyze each conditional branch in software. Since it requires arduous effort to craft new gadgets manually, the evaluations of detection mechanisms are based only on a handful of these gadgets.
ambiguity
- DNN (page 3) is undefined
- Equation 1: Notation X~Pdata is unknown to me
- “which alters the flags”. What is a flag? (Step 4, page 5)
quotes
- “Several techniques have been proposed to detect vulnerable Spectre gadgets in widely deployed commercial software. Unfortunately, detection techniques proposed so far rely on hand-written rules which fall short in covering subtle variations of known Spectre gadgets as well as demand a huge amount of time to analyze each conditional branch in software.”
- “In this work, we employ deep learning techniques for automated generation and detection of Spectre gadgets.”
- “Using mutational fuzzing, we produce a data set with more than 1 million Spectre-V1 gadgets which is the largest Spectre gadget data set built to date.”
- “we conduct the first empirical usability study of Generative Adversarial Networks (GANs) for creating assembly code without any human interaction.”
- “While the initial variants of Spectre [26] exploit conditional and indirect branches, Koruyeh et al. [27] proposes another Spectre variant obtained by poisoning the entries in Return-Stack-Buffers (RSBs).”
- “The proposed detection tools mostly implement taint analysis [66] and symbolic execution [15,65] to identify potential gadgets in benign applications. However, the methods proposed so far are have two shortcomings: (1) the scarcity of Spectre gadgets prevents the comprehensive evaluation of the tools, (2) the scanning time increases drastically with increasing binary file sizes.”
- “BERT [9] was proposed by the Google AI team to learn the relations between different words in a sentence by applying a self-attention mechanism [63].”
- “In summary,
- We extend 15 base Spectre examples to 1 million gadgets via mutational fuzzing,
- We propose SpectreGAN which leverages conditional GANs to create new Spectre gadgets by learning the distribution of existing Spectre gadgets in a scalable way,
- We show that both mutational fuzzing and SpectreGAN create diverse and novel gadgets which are not detected by oo7 and Spectector tools
- We introduce FastSpec which is based on supervised neural word embeddings to identify the potential gadgets in benign applications orders of magnitude faster than rule-based methods.”
- “Hence, Spectre-type attacks are not completely resolved yet and finding an efficient countermeasure is still an open problem.”
- “There are a number of known Spectre variants: Spectre-V1 (bounds check bypass), Spectre-V2 (branch target injection), Spectre-RSB [27, 34] (return stack buffer speculation), and Spectre-V4 [19] (speculative store bypass).”
- “The heavy part of the training is handled by processing unlabeled data in an unsupervised manner. The unsupervised phase is called pre-training which consists of masked language model training and next sentence prediction procedures.”
- “Defenses against Spectre. There are various detection methods for speculative execution attacks. Taint analysis is used in oo7 [66] software tool to detect leakages. As an alternative way, the taint analysis is implemented in the hardware context to stop the speculative execution for secret dependent data [53, 71]. The second method relies on symbolic execution analysis. Spectector [15] symbolically executes the programs where the conditional branches are treated as mispredicted. Furthermore, SpecuSym [18] and KleeSpectre [65]
aim to model cache usage with symbolic execution to detect speculative interference which is based on Klee symbolic execution engine. Following a different approach, Speculator [35] collects performance counter values to detect mispredicted branches and speculative execution domain. Finally, Specfuzz [44] uses a fuzzing strategy to analyze the control flow paths which are most likely vulnerable against speculative execution attacks.” - “Our mutation operator is the insertion of random Assembly instructions with random operands.”
- “The overall success rate of fuzzing technique is around 5% out of compiled gadgets.”
- “the input assembly functions are converted to a sequence of tokens T' = {x'1, … x'N} where each token represents an instruction, register, parenthesis, comma, intermediate value or label. SpectreGAN is conditionally trained with each sequence of tokens where a masking vector m = (m1, …, mN) with elements mt ∈ {0, 1} is generated.”
- “The training procedure consists of two main phases namely, pre-training and adversarial training.”
- “We keep commas, parenthesis, immediate values, labels, instruction and register names as separate tokens.”
- “The tokenization process converts the instruction "movq (%rax), %rdx" into the list ["movq", "(", "%rax", ")", ",", "%rdx"] where each element of the list is a token x't. Hence, each token list T' = {x'1, …, x'N} represents an assembly function in the data set.”
- “SpectreGAN is trained with a batch size of 100 on NVIDIA GeForce GTX 1080 Ti until the validation perplexity converges in Figure 2. The pre-training lasts about 50 hours while the adversarial training phase takes around 30 hours.” (80 hours = 3d 8h)
- “SpectreGAN is trained with a masking rate of 0.3, the success rate of gadgets increases up to 72%. Interestingly, the success rate drops for other masking rates, which also demonstrates the importance of masking rate choice.”
- “To illustrate an example of the generated samples, we fed the gadget in Listing 2 to SpectreGAN and generated a new gadget in Listing 3.”
- “Mutational fuzzing and SpectreGAN generated approximately 1.2 million gadgets in total.”
- “The quality of generated texts is mostly evaluated by analyzing the number of unique n-grams.”
- “However, it is challenging to examine the effects of instructions in the transient domain since they are not visible in the architectural state. After we carefully analyzed the performance counters for the Haswell architecture, we determined that two counters namely, UOPS_ISSUED : ANY and UOPS_RET IRED : ANY give an idea to what extent the speculative window is altered. UOPS_ISSUED : ANY counter is incremented every time a μop is issued which counts both speculative and non-speculative μops. On the other hand, UOPS_RET IRED : ANY counter only counts the executed and committed μops which automatically excludes speculatively executed μops.”
- “we have selected 100,000 samples from each gadget example uniformly random due to the immense time consumption of oo7 (150 hours for 100K gadgets) which achieves 94% detection rate.”
- “Interestingly, specific gadget types from both fuzzing and SpectreGAN are not caught by oo7. When a gadget contains cmov or xchg or set instruction and its variants, it is not identified as a Spectre gadget.”
- “Listing 5: XCHG gadget: When a past value controlled by the attacker is used in the Spectre gadget, oo7 cannot detect the XCHG gadget”
- “For each Assembly file, Spectector is adjusted to track 25 symbolic paths of at most 5000 instructions each, with a global timeout of 30 minutes. The remaining parameters are kept as default.”
- “23.75% of the gadgets are not detected by Spectector. We observed that 96% of the undetected gadgets contain unsupported instruction/register which is the indicator of an implementation issue in Spectector.”
- “After we examined the undetected gadgets, we observed that if the gadgets include either sfence/mfence/lfence or 8-bit registers (%al, %bl, %cl, %dl), they are likely to bypass Spectector.”
- “Differently, the mask positions are selected from 15% of the training sequence and the selected positions are masked and replaced with <MASK> token with 0.80 probability, replaced with a random token with 0.10 probability or kept as the same token with 0.10 probability.”
- “Since it is not possible to visualize the high dimensional embedding vectors, we leverage the t-SNE algorithm [33] which maps the embedding vectors to a three-dimensional space as shown in Figure 4.”
- “The output probabilities of the softmax layer are the predictions on the assembly code sequence.”
- “We combine the assembly data set that was generated in Section 4 and the disassembled Linux libraries to train FastSpec. Although it is possible that Linux libraries contain Spectre-V1 gadgets, we assume that the total number of hidden Spectre gadgets are negligible comparing the total size of the data set.”
- “In total, a dataset of 107 million lines of assembly code is collected which consists of 370 million tokens after the pre-processing.”
- “The pre-training phase takes approximately 6 hours with a sequence length of 50. We further train the positional embeddings for 1 hour with a sequence length of 250. The fine-tuning takes only 20 minutes on the pre-trained model to achieve classifying all types of samples in the test data set correctly.”
- “In the evaluation of FastSpec, we obtained 1.3 million true positives and 110 false positives (99.9% precision rate) in the test data set which demonstrates the high performance of FastSpec. We assume that the false positives are Spectre-like gadgets in Linux libraries, which needs to be explored deeply in the future work. Moreover, we only have 55 false negatives (99.9% recall rate) which yields 0.99 F-1 score on the test data set.”
- “The processing time of FastSpec is independent of the number of branches whereas for Spectector and oo7 the analysis time increases drastically.”
- “Consequently, FastSpec is faster than oo7 and Spectector 455 times and 75 times on average, respectively.”
- “The total number of tokens is 203,055 while the analysis time is around 17 minutes.”
- “This work for the first time proposed NLP inspired approaches for Spectre gadget generation and detection.”
summary
Very advanced paper. Perfect combination of Machine Learning technology with microarchitectural attack work. Shows a huge effort and nice considerations regarding Generative Adversarial Networks. However, I could not understand technical aspects of the machine learning part
- Several techniques have been proposed to detect vulnerable Spectre gadgets in widely deployed commercial software. Unfortunately, detection techniques proposed so far rely on hand-written rules. Current shortcomings are: (1) the scarcity of Spectre gadgets prevents the
comprehensive evaluation of the tools, (2) the scanning time increases drastically with increasing binary file sizes - Approach:
- Mutational fuzzing is used to expand Kocher's 15 + Spectector's 2 Spectre gadgets to more than 1 miilion
- A Generative Adversarial Network (SpectreGAN; based on MaskGAN) is used to generate Assembly
- FastSpec (based on BERT by Google) takes Assembly and determines whether some binary contains a Spectre gadget
- They achieve 455× the performance of oo7 and 75× the performance of Spectector
- The Linux kernel shows 1.3 million true positive Spectre gadgets and FastSpec found 110 false positives in 107 million lines of ASM code
- 379 matches were found in the OpenSSL 1.1.1g library
- On github, there is a 390 MB tar gz archive (split up). Decompressed, it has a size of 6.7 GB. 972 MB seem to be 239 025 Spectre test gadget files in ASM format
The masking rate (page 6) seems to be the percentage of hidden tokens during the training phase. Figure 4 is a little bit awkward and a little bit random. FastSpec/scan.sh seems to show how FastSpec was called to evaluate OpenSSL. And commands.txt tries to explain it somehow.
typo
- “The critical time period before the flush happens is commonly referred to the transient domain.” → ““The critical time period before the flush happens is commonly referred to as the transient domain.”
- “microarchtiectures” → “microarchitectures”
- “An resule of a sample gadget” → “An result of a sample gadget”
High-speed Instruction-set Coprocessor for Lattice-bas… §
Title: “High-speed Instruction-set Coprocessor for Lattice-based Key Encapsulation Mechanism: Saber in Hardware” by Sujoy Sinha Roy, Andrea Basso [url] [dblp]
Published in 2020 at CHES 2020 and I read it in 2020-11
Abstract: In this paper, we present an instruction set coprocessor architecture for lattice-based cryptography and implement the module lattice-based post-quantum key encapsulation mechanism (KEM) Saber as a case study. To achieve fast computation time, the architecture is fully implemented in hardware, including CCA transformations. Since polynomial multiplication plays a performance-critical role in the module and ideal lattice-based public-key cryptography, a parallel polynomial multiplier architecture is proposed that overcomes memory access bottlenecks and results in a highly parallel yet simple and easy-to-scale design. Such multipliers can compute a full multiplication in 256 cycles, but are designed to target any area/performance trade-offs. Besides optimizing polynomial multiplication, we make important design decisions and perform architectural optimizations to reduce the overall cycle counts as well as improve resource utilization.
open questions
- Figure 3: Why are control signals leading to the building blocks like AddPack? Wouldn't it be simpler (for synchronization) to make control signals part of the communication over the bus?
- “The overhead of memory access during polynomial multiplication plays a critical role in lattice-based cryptography (e.g., [RVM+14], [BMTK+20]) and could hinder or complicate logic-level parallel processing.”
- What kind of role?
quotes
- “In 2012, Göttert et al. [GFS + 12] reported the first hardware implementation of the ideal lattice-based LPR [LPR10] public-key encryption scheme. Their implementation used a massively parallel and unrolled NTT-based polynomial multiplier architecture that consumed millions of LUTs and flip-flops.”
- “A comparison of most round 2 submissions, including Saber, can be found in [DFA + 20].”
- “In [DKSRV18] the authors of Saber proposed a fast polynomial multiplier based on the Toom-Cook algorithm [Knu97] and showed that a non-NTT parameter set does not make their implementation slow.”
- “In practice, more than 50% of the computation time is spent on generating pseudo-random numbers using SHAKE128, thus making it the performance bottleneck.”
- “Dang et al. [DFAG19] compare seven lattice-based key encapsulation methods on HW/SW codesign platforms. They report that out of the seven tested protocols (FrodoKEM, Round5, Saber, NTRU-HPS, NTRU-HRSS, Streamlined NTRU Prime and NTRULPRime), Saber is the fastest protocol in the encapsulation operation and second fastest in the decapsulation operation.”
- “At the same time, implementing such an accelerator is a challenging research topic because it requires making careful design decisions that take into account both algorithmic and architectural alternatives for the internal building blocks and their interactions at the protocol level.”
- “When a HW-only implementation is considered, one design option is to cascade different building blocks in the data-path following the standard data-flow model, if the blocks are required in multiple parallel instances.”
- “To achieve programmability and flexibility, we realize an instruction-set coprocessor architecture for Saber.”
- “In this work, we use the open-source high-speed implementation of the Keccak core that was designed by the Keccak Team [Tea19]. This high-speed implementation of Keccak computes ‘state-permutations’ at a gap of only 28 cycles, thus generating 1,344 bits of pseudo-random string every 28 cycles during the extraction-phase. Furthermore, we observed that one instance of the Keccak core consumes around 5K LUTs and 3K registers, which are respectively nearly 21% and 31% of the overall area in our implementation.”
- via SaberX4 paper: “Our proof-of-concept software implementation of SaberX4 achieves nearly 1.5 times higher throughput at the cost of latency degradation within acceptable margins, compared to the AVX2-
optimized non-batched implementation of Saber by its authors.“
- via SaberX4 paper: “Our proof-of-concept software implementation of SaberX4 achieves nearly 1.5 times higher throughput at the cost of latency degradation within acceptable margins, compared to the AVX2-
- “The use of ‘4-bit’ signed-magnitude’ representation simplifies the hardware architecture because we can store 16 such samples easily in a 64-bit word of the data memory. Thus, no sample is split across two words.”
- Because consider that [-3, 3] requires 3 bits (7 states), [-4, 4] requires 4 bits (9 states) and [-5, 5] requires 4 bits (11 states). If we use 4 bits, we can cover all cases and it will be not split among bytes. Consider 3 bits. Then we have the following partition among 3 bytes: [3 3 2][1 3 3 1][2 3 3]
- “asymptotically the second fastest after the NTT-based polynomial multiplication.”
- Remove “the”, since Schönhage-Strassen and Fürer are two NTT-based multiplication algorithms
- “The hardware implementation of the Toom-Cook polynomial multiplication by Bermudo Mera et al. [BMTK + 20] describes the challenges in implementing the recursive function calls in hardware and proposes efficient architectures.”
- “This nega-cyclic rotation happens since the reduction-polynomial is x256 +1.”
summary
A good read with appropriate comparisons with other schemes and implementations. Reasonable arguments are provided for design decisions. The runtime results were expectable and have been met.
- “In this paper, we present an instruction set coprocessor architecture for lattice-based cryptography and implement the module lattice-based post-quantum key encapsulation mechanism (KEM) Saber as a case study.”
- “Since polynomial multiplication plays a performance-critical role in the module and ideal lattice-based public-key cryptography, a parallel polynomial multiplier architecture is proposed that overcomes memory access bottlenecks and results in a highly parallel yet simple and easy-to-scale design.”
- “For the module dimension 3 (security comparable to AES-192), the coprocessor computes CCA key generation, encapsulation, and decapsulation in only 5,453, 6,618 and 8,034 cycles respectively, making it the fastest hardware implementation of Saber to our knowledge.”
- “module dimension 3” corresponds to “NIST PQC category 3”
- “The Vivado project and all HDL source codes are available at https://github.com/sujoyetc/SABER_HW.”
- In Figure 3, the pseudo-random word from data memory is split into chunks of size μ. μ bits are turned into a binomial sample of size 4 bits.
- “There is no conditional branching in the algorithms used and all the building blocks have been designed to be constant-time.”
- “Software benchmarking [KRSS19] of many lattice-based KEM schemes have reported that 50-70% of the overall computation time is spent on executing the Keccak function, thus making it the most performance-critical component.”
- The Saber reference implementation uses code similar to Kyber for binomial sampling
- In Figure 7, the sign bit only depends on one argument, because s(x) is always positive.
- “The instruction-set coprocessor architecture is described in mixed Verilog and VHDL and is compiled using Xilinx Vivado for the target platform Xilinx ZCU102 board that has an UltraScale+ XCZU9EG-2FFVB1156 FPGA.”
- “We tested the functional correctness of the coprocessor on the ZCU102 board and at 250 MHz clock frequency, the CCA-secure key generation, encapsulation and decapsulation operations take 21.8, 26.5, and 32.1 μs respectively.”
- “As the polynomial multiplier architecture is scalable, we implemented a variant of it with MAC units fitting two multipliers. With this higher-performing architecture, the cycle counts for polynomial multiplications nearly halves, …”
- “The overall cycle count for Saber (module dimension 3) is 4,320, 5,231 and 6,461 for key generation, encapsulation, and decapsulation respectively. Thus, the cycle count is reduced by 21%, 21%, and 20% respectively. The increased speed comes with increased area consumption of 1.83× for LUTs and 1.74× for flip-flops (this is both due to the increased area consumption of the MAC units with two multipliers and due to the pipelining).”
- “In Table 5 we compare our flexible architecture with some of the recent hardware implementations of post-quantum KEM schemes.”
- “The SIKE [JF11] scheme relies on the computational hardness of the supersingular isogeny problem. Its most recent hardware implementation by Massolino et al. [MLRB20] targets high speed and even beats Frodo KEM. Our hardware implementation of Saber is around 500 to 600 times faster than their implementation.”
Instruction Cycle Count
Keygen Encapsulation Decapsulation
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
SHA3-256 339 585 303
SHA3-512 0 62 62
SHAKE-128 1,461 1,403 1,403
Vector sampling 176 176 176
Polynomial multiplications 2,685 3,592 4,484
(i.e. 47% of column) (i.e. 54%) (i.e. 56%)
Remaining operations 792 800 1,606
Total cycles 5,453 6,618 8,034
Total time at 250 MHz 21.8 μs 26.5 μs 32.1 μs
typo
- “This means that each MAC unit should receive in input”
“This means that each MAC unit should receive an input” - “Nevertheless, our coprocessor has been tested in the hardware”
“Nevertheless, our coprocessor has been tested in hardware”
Historical Notes on the Fast Fourier Transform §
Title: “Historical Notes on the Fast Fourier Transform” by W Lewis, Peter D Welch [url] [dblp]
Published in 1967 at IEEE 1967 and I read it in 2020-03
Abstract: The fast Fourier transform algorithm has a long and interesting history that has only recently been appreciated. In this paper, the contributions of many investigators are described and placed in historical perspective.
summary
- Survey paper on historical developments of the four years before. Nice and straight. Forward summary though I didn't go through details of the Prime factor algorithm.
- “The greatest emphasis, however, was on the computational economy that could be derived from the symmetries of the sine and cosine functions”
- “use the periodicity of the sine-cosine functions to obtain a 2N-point Fourier analysis from two N-point analyses with only slightly more than N operations. Going the other way, if the series to be transformed is of length N and N is a power of 2, the series can be split into log, N subseries”
- “The number of computations in the resulting successive doubling algorithm is therefore proportional to N log₂ N rather than N²”
- “The fast Fourier transform algorithm of Cooley and Tukey is more general in that it is applicable when N is composite and not necessarily a power of 2”
- “the algorithms are different for the following reaons : 1) in the Thomas algorithm the factors of N must be mutually prime; 2) in the Thomas algorithm the calculation is precisely multidimensional Fourier analysis with no intervening phase shifts or “twiddle factors” as they have been called ; and 3) the correspondences between the one-dimensional index and the multidimensional indexes in the two algorithms are quite different.”
- “The factor Wj0n0, referred to as the “twiddle factor” by Gentleman and Sande, is usually combined with either the Wrj0n factor in (eq 4) or the Wsjin0 factor in (eq 5)”
- eq 4: A_1(j_0, n_0) = \sum_{n1=0}^{r-1} A(n_1, n_0) W_r^{j_0 n_1}
- eq 5: X(j_1, j_0) = \sum_{n_0=0}^{s-1} A_1(j_0, n_0) W_s^{j_1 n_0} W_N^{j_0 n_0}
How Usable are Rust Cryptography APIs? §
Title: “How Usable are Rust Cryptography APIs?” by Kai Mindermann, Philipp Keck, Stefan Wagner [url] [dblp]
Published in 2018 at QRS 2018 and I read it in 2021-12
Abstract: Context: Poor usability of cryptographic APIs is a severe source of vulnerabilities. Aim: We wanted to find out what kind of cryptographic libraries are present in Rust and how usable they are. Method: We explored Rust’s cryptographic libraries through a systematic search, conducted an exploratory study on the major libraries and a controlled experiment on two of these libraries with 28 student participants. Results: Only half of the major libraries explicitly focus on usability and misuse resistance, which is reflected in their current APIs. We found that participants were more successful using rustcrypto which we considered less usable than ring before the experiment. Conclusion: We discuss API design insights and make recommendations for the design of crypto libraries in Rust regarding the detail and structure of the documentation, higher-level APIs as wrappers for the existing low-level libraries, and selected, good-quality example code to improve the emerging cryptographic libraries of Rust.
quotes
- “We found that participants were more successful using rust-crypto which we considered less usable than ring before the experiment. Conclusion: We discuss API design insights and make recommendations for the design of crypto libraries in Rust regarding the detail and structure of the documentation, higher-level APIs as wrappers for the existing low-level libraries, and selected, good-quality example code to improve the emerging cryptographic libraries of Rust.”
- “Georgiev et al. [3] analyzed SSL certificate validation code in a range of applications and found that the libraries themselves are correct “for the most part” but developers often misunderstand the APIs which is the “primary cause” for vulnerabilities.”
- “Egele et al. [5] automatically checked Android apps and found widespread security flaws such as ECB mode, constant keys, constant salts and constant pseudorandom number generator (PRNG) seeds, all of which can be traced back to a misuse of the cryptographic APIs in Android.”
- “Lazar et al. [6] investigated 269 vulnerabilities from the CVE database and found that the majority is caused by application code which misuses the properly implemented cryptographic libraries. Das and King [7] defined seven properties to determine how safe a cryptographic library is and applied them to six libraries for the most popular programming languages. Nadi et al. [8] empirically investigated the Java
Cryptography Architecture (JCA) by analyzing StackOverflow posts, GitHub repositories and surveying developers. They found that the APIs are perceived as being too low-level and recommended task-based API (similar to the cryptography.io library) and improved documentation as solutions.” - “Through our search, we found the following 81 libraries:
- 10 crypto-specific utility libraries for constant-time operations, secret memory and similar.
- 13 larger libraries which offer multiple primitives and usually multiple algorithms per primitive. The implementations are either written in Rust or attached through wrappers to code in another language. The major libraries in this category are introduced in section III-A.
- 35 libraries which implement a single primitive, algorithm or a small family of them in Rust.
- 5 libraries which offer a simpler interface to other implementations for specific application scenarios.
- 18 libraries that implement cryptosystems or protocols (mostly Transport Layer Security (TLS)) and usually depend on lower-level libraries.”
- “We determined major libraries semi-manually: Any library with 20 or more dependent crates is considered a major library.”
- “Two projects aim to implement the most relevant cryptographic primitives natively in Rust, to eventually become an alternative to the established crypto libraries in C. The older project is rust-crypto which is the second most popular crypto library after rust-openssl and the newer one is octavo which is still incomplete and insecure.”
- “A relatively new library that aspires to provide even more usable and misuse-resistant APIs is ring. It uses a slimmed-down code base of BoringSSL, Google’s simplified OpenSSL fork and excludes the entire TLS stack and most deprecated algorithms. The API is developed independently of the underlying Assembler/C code and notable efforts and innovations are made regarding misuse resistance. Because ring is technically a fork of BoringSSL, all contributions to BoringSSL are also counted towards ring’s statistics.”
- “Even after these subtractions, ring still has by far the highest number of commits and quite many contributors considering its young age.”
- “They are not independent, though, as rust-crypto’s API design cleverly integrates elements in a way that makes sense from an implementation perspective: the Hmac<D> takes a hashing algorithm (digest) D as a parameter, for example. The same concept is applied throughout the library, e.g., the generic CTR mode implementation can be used with any block cipher.”
- “Misuse resistance: The HMAC API returns a custom struct which overrides the == operator so that comparisons are performed in constant time.”
- “Misuse resistance: ring exclusively offers authenticated kinds of encryption which prevent accidental misuse of unauthenticated encryption. The HMAC module provides a verify() function that uses a constant-time comparison.”
- “Although the top-level documentation contains extensive information about how a cryptosystem is defined mathematically, about Kerckhoff’s Principle and about what key lengths are secure, it does not inform about the key/initialization vector (IV) lengths actually required by the ChaCha20 cipher. Digging into the code (or trial and error) revealed that they must be multiples of 32 bits; hence the 112 bits recommended for ‘medium-term protection’ by the documentation would not work.”
- “This makes the components nicely composable but often requires the caller to use the unpleasant ‘turbofish’ operator.”
- “The explorer often had trouble preallocating the vectors for &mut [u8] parameters (Rust’s out-parameters). There is a simple solution since March 2015: vec ![0; length ]. But before that many Rust users were frustrated when allocating a vector that some ridiculous solutions were suggested, including one that requires two lines of code and an unsafe block 9 (still the accepted answer) or one that creates an infinite iterator and collects it.”
- “The controlled experiment was conducted with students of the University of Osnabrück, who had just completed a semester-long lecture on the Rust programming language.”
- “Before we describe the sample, we exclude some of the participant’s results because participants 3, 4, 6, 14 and 23 tried to implement the Advanced Encryption Standard themselves but failed. We ignore their data as they might have performed differently if the task had been understood better. For the same reason we discount participant 15 who implemented a Caesar cipher (unsuccessful). We also disregard the data from the supervisor (participant 13) of the course. This leaves 11 participants in both groups.”
- “The remaining 22 participants were between 19 and 26 years old (median 22.5) and most of them were male (seven were female).”
- “Calculating the effectiveness for each participant and summing it up to get a total effectiveness of the experiment sample we got Erc = 0.66 for rust-crypto and Ering = 0.28 for ring. The distributions for the effectiveness of the two libraries is depicted in the top left boxplot in Figure 1.”
- “Hence, we could not reject the null hypothesis that there is no difference in the effectiveness of using rust-crypto and ring.”
- “Satisfaction can be measured by standardized questionnaires. We used the System Usability Scale (SUS) by Brooke [15] (similar to [13]) which is simple, quick and accurate [16] as it contains only a set of 10 questions.”
- “Only 4 of 7 (57%) who used the example code from/for rust-crypto, completed the task. Participants who did not use the example did not complete the task.”
- “Not a single participant was worried about the security implications of their choices during the experiment. Everyone went with the defaults provided by the library and tried to get something working. Those who succeeded did not reconsider earlier choices. Accordingly, the participants were rather unsure about the security of their code. In particular, all rust-crypto users ended up using unauthenticated encryption without knowing its potential dangers and the vast majority stuck to the CBC mode and PKCS padding given by the code example.”
- “The code example also made rust-crypto users more confident in the perceived security of their solution, […]”
- “[…] a relatively large number of participants also visited Wikipedia and StackOverflow to learn about AES encryption, nonces and other topics.”
- “In-place APIs are difficult: […]”
- “This kind of in-place API has a number of benefits: it does not require extra heap allocations on the part
of the library – some environments do not have a heap, so this can be a hard requirement – and it uses minimal extra memory, as the plaintext can be overwritten.” - “From the comments about the experiment we also get ‘Did I mention that I missed a good documentation for rust-crypto?’ and ‘The ring library desperately needs a good documentation with examples’”
- “Viewing the current Rust crypto APIs in the light of recent research, we found that: Insecure defaults do not occur and most APIs try to avoid defaults entirely. Authenticated encryption is not advertised enough in low-level libraries, whereas the high-level libraries omit unauthenticated encryption altogether for maximal misuse resistance. Few high-level libraries are available. A few projects do not warn about deprecated/broken algorithms. There are no measures against accidental nonce reuse.
- Do not explain cryptographic concepts yourself but link to comprehensible resources.
- Make recommendations when there are multiple choices. E.g. if parameters can be constructed differently, it should be explained what the different choices imply.”
summary
A well-written usability paper. The authors thoroughly determine a list of rust cryptographic libraries and their history. Subsequently one author tried to implement an advanced usecase with the rust-crypto, ring, rust-openssl, rust_sodium, and octavo libraries. Then 22 participants with little cryptographic knowledge were asked to implement a basic usecase. The results were expected but can lead to practical improvement suggestions (which happened partially).
- RQ1 is a vague research question
- The loop in the source code of section 5b is really terrible.
- in-place does not necessarily require dynamic memory allocation (discussion in section 5.J). You supply an output buffer and the result is written there. Then the data of the original object does not need to be modified to store the result.
typos
- “Therefor we decided” → “Therefore, we decided”
In defense of PowerPoint §
Title: “In defense of PowerPoint” by N. Holmes [url] [dblp]
Published in 2004 at and I read it in 2021-08
Abstract:
quotes
- “Computing professionals who blame their machinery for their failures set a bad example for computer users already prone to using the computer as a scapegoat.”
- “PowerPoint is just presentation technology’s latest iteration and will eventually be replaced by something else.”
- “Presentation technology first took a direct form. In Europe more than two millennia ago, presenters developed mnemonic techniques. For centuries, early books served only as a reference for presentations, the idea of reading silently being considered strange when first introduced.”
- “Currently, presenters most neglect the persuasive aspect, yet in olden times, knowledge of rhetorical principles was considered one of a classical education’s more important benefits.”
- “Some teachers see PowerPoint as a splendid tool to help them convey ideas. Others prefer to use browser-driven HTML.”
- “Our digital technology would be better used in the classroom by administering drill and practice as a foundation for literacy and numeracy so that teachers can concentrate on the more important job of inculcating and encouraging social capability, which they must do personally.”
- “”
summary
The argument is “blame the user, not the tool”. Otherwise this article just recites known arguments.
Languages and the computing profession §
Title: “Languages and the computing profession” by N. Holmes [url] [dblp]
Published in 2004 at and I read it in 2021-09
Abstract:
quotes
- “as in the case of German versus English as Samuel Langhorne Clemens described (www.bdsnett.no/klaus/twain).”
- “The difficulty here is deciding which meanings are primary.”
- “Regularity. The rules for combining and ordering codes, and for systematic codes such as those for colors, must be free from exceptions and variations.”
- “Designing the intermediate language to be spoken as words and thus to serve as an auxiliary language would be a mistake.”
- “First, designing the intermediate language for general auxiliary use would unnecessarily and possibly severely impair its function as an intermediary. Second, a global auxiliary language’s desirable properties differ markedly from those needed for an intermediary in translation, as the auxiliary language Esperanto’s failure in the intermediary role demonstrates.”
- “Defining the intermediate language requires developing and verifying its vocabulary and grammar as suitable for mediating translation between all classes and kinds of natural language.”
- “Indeed, the qualities of an intermediate language could make search engines much more effective.”
- “Strategically, a much better way to use digital technology to help the poor and counter global inequity and its symptomatic digital divide would be for the UN to take responsibility for the development and use of a global intermediate translation language.”
summary
Very shallow reading which proposes an intermediate language. He discusses some properties (specifity, precision, regularity, literality, neutrality) but fails to achieve a coherent requirement analysis.
McBits: Fast Constant-Time Code-Based Cryptography §
Title: “McBits: Fast Constant-Time Code-Based Cryptography” by Daniel J. Bernstein, Tung Chou, Peter Schwabe [url] [dblp]
Published in 2015 at CHES 2013 and I read it in 2022-04
Abstract: This paper presents extremely fast algorithms for code-based public-key cryptography, including full protection against timing attacks. For example, at a 2128 security level, this paper achieves a reciprocal decryption throughput of just 60493 cycles (plus cipher cost etc.) on a single Ivy Bridge core. These algorithms rely on an additive FFT for fast root computation, a transposed additive FFT for fast syndrome computation, and a sorting network to avoid cache-timing attacks.
quotes
- “To summarize, all of these examples of bitsliced speed records are for small Sboxes or large binary fields, while code-based cryptography relies on medium-size fields and seems to make much more efficient use of table lookups.” (Bernstein et al., 2013, p. 3)
- “[…] we point out several ways that our decryption algorithm improves upon the algorithm used in [44]: we use an additive FFT rather than separate evaluations at each point (“Chien search”); we use a transposed additive FFT rather than applying a syndrome-conversion matrix; we do not even need to store the syndrome-conversion matrix, the largest part of the data stored in [44]; and we use a simple hash (see Section 6) rather than a constant-weight-word-to-bit-string conversion” (Bernstein et al., 2013, p. 6)
- “For multipoint evaluation we use a characteristic-2 “additive FFT” algorithm introduced in 2010 [39] by Gao and Mateer (improving upon an algorithm by von zur Gathen and Gerhard in [40], which in turn improves upon an algorithm proposed by Wang and Zhu in [77] and independently by Cantor in [29]), together with some new improvements described below.” (Bernstein et al., 2013, p. 7)
-
“The basic idea of the algorithm is to write f in the form f0(x2−x)+xf1(x2−x) for two half-degree polynomials f0, f1 ∈ Fq[x]; this is handled efficiently by the ‘radix conversion’ described below. This form of f shows a large overlap between evaluating f(α) and evaluating f(α + 1). Specifically, (α + 1)2 − (α + 1) = α2 − α, so
f(α) = f0(α2 − α) + αf1(α2 − α),
f(α + 1) = f0(α2 − α) + (α + 1)f1(α2 − α).
Evaluating both f0 and f1 at α2 − α produces both f(α) and f(α + 1) with just a few more field operations: multiply the f1 value by α, add the f0 value to obtain f(α), and add the f1 value to obtain f(α + 1).” (Bernstein et al., 2013, p. 8) - “Consider the problem of computing the vector (∑α rα, ∑α rαα, …, ∑α rααd), given a sequence of q elements rα ∈ Fq indexed by elements α ∈ Fq, where q = 2m. This vector is called a ’syndrome’.” (Bernstein et al., 2013, p. 10)
- “The transposition principle states that if a linear algorithm computes a matrix M (i.e., M is the matrix of coefficients of the inputs in the outputs) then reversing the edges of the linear algorithm, and exchanging inputs with outputs, computes the transpose of M .” (Bernstein et al., 2013, p. 12)
- “In particular, since syndrome computation is the transpose of multipoint evaluation, reversing a fast linear algorithm for multipoint evaluation produces a fast linear algorithm for syndrome computation.” (Bernstein et al., 2013, p. 12)
- “This procedure produced exactly the desired number of operations in Fq but was unsatisfactory for two reasons. First, there were a huge number of nodes in the graph, producing a huge number of variables in the final software. Second, this procedure eliminated all of the loops and functions in the original software, producing a huge number of lines of code in the final software. Consequently the C compiler, gcc, became very slow as m increased and ran out of memory around m = 13 or m = 14, depending on the machine we used for compilation.” (Bernstein et al., 2013, p. 13)
- “A ‘sorting network’ uses a sequence of ‘comparators’ to sort an input array S. A comparator is a data-independent pair of indices (i, j); it swaps S[i] with S[j] if S[i] > S[j]. This conditional swap is easily expressed as a data-independent sequence of bit operations: first some bit operations to compute the condition S[i] > S[j], then some bit operations to overwrite (S[i], S[j]) with (min {S[i], S[j]}, max {S[i], S[j]}). There are many sorting networks in the literature. We use a standard ‘oddeven’ sorting network by Batcher [4], which uses exactly (m2 − m + 4)2m−2 − 1 comparators to sort an array of 2m elements. This is more efficient than other sorting networks such as Batcher’s bitonic sort [4] or Shell sort [72]. The oddeven sorting network is known to be suboptimal when m is very large (see [2]), but we are not aware of noticeably smaller sorting networks for the range of m used in code-based cryptography.” (Bernstein et al., 2013, p. 14)
- “Our goals in this paper are more conservative, so we avoid this approach: we are trying to reduce, not increase, the number of questions for cryptanalysts.” (Bernstein et al., 2013, p. 16)
- “Code-based cryptography is often presented as encrypting fixed-length plaintexts. McEliece encryption multiplies the public key (a matrix) by a k-bit message to produce an n-bit codeword and adds t random errors to the codeword to produce a ciphertext. The Niederreiter variant (which has several well-known advantages, and which we use) multiplies the public key by a weight-t n-bit message to produce an (n − k)-bit ciphertext. If the t-error decoding problem is difficult for the public code then both of these encryption systems are secure against passive attackers who intercept valid ciphertexts for random plaintexts.” (Bernstein et al., 2013, p. 16)
- “However, this argument relies implicitly on a detailed analysis of how much information the attacker actually obtains through timing. By systematically eliminating all timing leaks we eliminate the need for such arguments and analyses.” (Bernstein et al., 2013, p. 18)
- “A security proof for Niederreiter KEM/DEM appeared very recently in Persichetti’s thesis [64]. The proof assumes that the t-error decoding problem is hard; it also assumes that a decoding failure for w is indistinguishable from a subsequent MAC failure. This requires care in the decryption procedure; see below.” (Bernstein et al., 2013, p. 18)
- “Many authors have stated that Patterson’s method is somewhat faster than Berlekamp’s method.” (Bernstein et al., 2013, p. 19)
- “CFS is a code-based public-key signature system proposed by Courtois, Finiasz, and Sendrier in [31]. The main drawbacks of CFS signatures are large public-key sizes and inefficient signing; the main advantages are short signatures, fast verification, and post-quantum security.” (Bernstein et al., 2013, p. 19)
strong statement
“We have found many claims that NTRU is orders of magnitude faster than RSA and ECC, but we have also found no evidence that NTRU can match our speeds” (Bernstein et al., 2013, p. 5)
summary
The paper describes multiple optimization techniques useful for code-based cryptography. It mentions 400,000 decryptions per second at the 280 security level and 200,000 per second at the 2128 security level. Among the techniques, Horner’s rule, Chien search (Gao-Mateer additive search), and syndrome computation as transpose of multipoint evaluation is mentioned.
Neat paper starting with prior work to describe optimized software implementations for Niederreiter’s cryptosystem as well as the CFS signature scheme. One question discussed is whether bitslicing is worth the effort. Interestingly, the scheme is explained at the end (section 6); not the beginning. The data is described in text and not summarized in a table. I think the main contribution are the optimization techniques in section 3.
NTRU: A ring-based public key cryptosystem §
Title: “NTRU: A ring-based public key cryptosystem” by Jeffrey Hoffstein, Jill Pipher, Joseph H. Silverman [url] [dblp]
Published in 1998 at ANTS 1998 and I read it in 2021-11
Abstract: W e describe N T R U , a new public key cryptosystem. N T R U features reasonably short, easily created keys, high speed, and low memory requirements. NTR.U encryption and decryption use a mixing system suggested by polynomial algebra combined with a clustering principle based on elementary probability theory. The security of the N T R U cryptosystem comes from the interaction of the polynomial mixing system with the independence of reduction modulo two relativelyprime integersp and q.
quotes
- “Currently, the most widely used public key system is RSA, which was created by Rivest, Shamir and Adelman in 1978 [9] and is based on the difficulty of factoring large numbers. Other systems include the McEliece system [8] which relies on error correcting codes, and a recent system of Goldreich, Goldwasser, and Halevi [4] which is based on the difficulty of lattice reduction problems.”
- “In this paper we describe a new public key cryptosystem, which we call the NTRU system. The encryption procedure uses a mixing system based on polynomial algebra and reduction modulo two numbers p and q, while the decryption procedure uses an unmixing system whose validity depends on elementary probability theory. The security of the NTRU public key cryptosystem comes from the interaction of the polynomial mixing system with the independence of reduction modulo p and q. Security also relies on the (experimentally observed) fact that for most lattices, it is very difficult to find extremely short (as opposed
to moderately short) vectors.” - “Encryption and decryption with NTRU are extremely fast, and key creation is fast and easy. See Section 5 for specifics, but we note here that NTRU takes O(N2) operations to encrypt or decrypt a message block of length N, making it considerably faster than the O(N3) operations required by RSA. Further, NTRU key lengths are O(N), which compares well with the O(N2) key lengths required by other "fast" public keys systems such as [8, 4].”
- “In principle, computation of a product F⭙G requires N2 multiplications. However, for a typical product used by NTRU, one of F or G has small coefficients, so the computation of F⭙G is very fast. On the other hand, if N is taken to be large, then it might be faster to use Fast Fourier Transforms to compute products F⭙G in O(N log N) operations.”
- “For appropriate parameter values, there is an extremely high probability that the decryption procedure will recover the original message. However, some parameter choices may cause occasional decryption failure, so one should probably include a few check bits in each message block. The usual cause of decryption failure will be that the message is improperly centered, In this case Dan will be able to recover the message by choosing the coefficients of a ≡ f ⭙ e (mod q) in a slightly different interval, for example from -q/2 + x to q/2 + x for some small (positive or negative) value of x. If no value of x works, then we say that we have gap failure and the message cannot be decrypted as easily. For well-chosen parameter values, this will occur so rarely that it can be ignored in practice.”
- “In order for the decryption process to work, it is necessary that |f⭙m + pφ⭙g|∞ < q. We have found that this will virtually always be true if we choose parameters so that |f⭙m|∞ ≤ q/4 and |pφ⭙g|∞ ≤ q/4, and in view of the above Proposition, this suggests that we take |f|2|m|2 ≈ q/4γ2 and |φ|2|g|2 ≈ q/4pγ2 for a γ2 corresponding to a small value for ε.”
- “An attacker can recover the private key by trying all possible f ∈ Lf and testing if f⭙h (mod q) has small entries, or by trying all g∈ Lg and testing if g⭙h-1 (mod q) has small entries. Similarly, an attacker can recover a message by trying all possible φ ∈ Lφ and testing if e - φ⭙h (mod q) has small entries.”
- Security analysis
- Brute force attacks
- Meet-in-the-middle attacks
- Multiple transmission attacks
- Lattice based attacks
- “The object of this section is to give a brief analysis of the known lattice attacks on both the public key h and the message m. We begin with a few words concerning lattice reduction. The goal of lattice reduction is to find one or more "small" vectors in a given lattice. In theory, the smallest vector can be found by an exhaustive search, but in practice this is not possible if the dimension is large. The LLL algorithm of Lenstra-Lenstra-Lovs [7], with various improvements due to Schnorr and others, [10, 12, 11] will find relatively small vectors in polynomial time, but even LLL will take a long time to find the smallest vector provided that the smallest vector is not too much smaller than the expected length of the smallest vector. We will make these observations more precise below.”
- “In this section we describe our preliminary analysis of the security of the NTRU Public Key Cryptosystem from attacks using lattice reduction methods. It is based on experiments which were performed using version 1.7 of Victor Shoup's implementation of the Schnorr,Euchner and Hoerner improvements of the LLL algorithm, distributed in his NTL package at http://www.cs.wisc.edu/~shoup/ntl/ . The NTL package was run on a 200 M Hz Pentium Pro with a Linux operating system.”
- “Comparison With Other PKCS's . There are currently a number of public key cryptosystems in the literature, including the system of Rivest, Shamir, and Adelman (RSA [9]) based on the difficulty of factoring, the system of McEliece [8] based on error correcting codes, and the recent system of Goldreich, Goldwasser, and Halevi (GGH [4]) based on the difficulty of finding short almost-orthogonalized bases in a lattice.”
- “The NTRU system has some features in common with McEliece's system, in that ⭙-multiplictaion in the ring R can be formulated as multiplication of matrices (of a special kind), and then encryption in both systems can be written as a matrix multiplication E = AX + Y, where A is the public key. A minor difference between the two systems is that for an NTRU encryption, Y is the message and X is a random vector, while the McEliece system reverses these assignments. But the real difference is the underlying trapdoor which allows decryption. For the McEUece system, the matrix A is associated to an error correcting (Goppa) code, and decryption works because the random contribution is small enough to be "corrected" by the Goppa code. For NTRU, the matrix A is a circulant matrix, and decryption depends on the decomposition of A into a product of two matrices having a special form, together with a lifting from mod q to mod p.”
- “As far as we can tell, the NTRU system has little in common with the RSA system.”
- “”
summary
Very fundamental, important paper. NTRU has the unique property of a lattice scheme switching between moduli (“polynomial mixing system” or “lifting operation”). It is interesting to see that they saw RSA and GGH as competitors for NTRU. I think they did a very good job looking at its security analysis, but I didn't look into the details.
New directions in cryptography §
Title: “New directions in cryptography” by W. Diffie, M. Hellman [url] [dblp]
Published in 1976 at Invited paper, IEEE Transactions on Information Theory, Volume 22, Issue 6 and I read it in 2021-05
Abstract:
definitions
- “In a public key cryptosystem enciphering and deciphering are governed by distinct keys, E and D, such that computing D from E is computationally infeasible (e.g., requiring 10100 instructions).”
- “Public key distribution systems offer a different approach to eliminating the need for a secure key distribution channel. In such a system, two users who wish to exchange a key communicate back and forth until they arrive at a key in common. A third party eavesdropping on this exchange must find it computationally infeasible to compute the key from the information overheard,”
- “A privacy system prevents the extraction of information by unauthorized parties from messages”
- “An authentication system prevents the unauthorized injection of messages into a public channel, assuring the receiver of a message of the legitimacy of its sender.”
- “A cryptographic system is a single parameter family {SK}K∈K of invertible transformations SK: {P} → {C} from a space (P) of plaintext messages to a space (C) of ciphertext messages. The parameter K is called the key and is selected from a finite set (K) called the keyspace. If the message spaces (PI and {C) are equal, we will denote them both by (M).”
- “A system which is secure due to the computational cost of cryptanalysis, but which would succumb to an attack with unlimited computation, is called computationally secure; while a system which can resist any cryptanalytic attack, no matter how much computation is allowed, is called unconditionally secure.”
- “We will call a task computationally infeasible if its cost as measured by either the amount of memory used or the runtime is finite but impossibly large.”
- “A ciphertext only attack is a cryptanalytic attack in which the cryptanalyst possesses only ciphertext.”
- “A known plaintext attack is a cryptanalytic attack in which the cryptanalyst possesses a substantial quantity of corresponding plaintext and ciphertext.”
- “A chosen plaintext attack is a cryptanalytic attack in which the cryptanalyst can submit an unlimited number of plaintext messages of his own choosing and examine the resulting cryptograms.”
- “A public key cryptosystem is a pair of families {EK}K∈K and {DK}K∈K of algorithms representing invertible transformations, EK: {M} → {M} and DK: {M} → {M} on a finite message space {M} such that
- for every K ∈ {K} is the inverse of DK,
- for every K ∈ {K} and M ∈ {M}, the algorithms EK and DK are easy to compute,
- for almost every K ∈ {K}, each easily computed algorithm equivalent to DK is computationally infeasible to derive from EK,
- for every K ∈ {K}, it is feasible to compute inverse pairs EK and DK from K.”
- “More precisely, a function f is a one-way function if, for any argument x in the domain off, it is easy to compute the corresponding value f(x), yet, for almost all y in the range off, it is computationally infeasible to solve the equation y = f(x) for any suitable argument x.”
- “Trap doors have already been seen in the previous paragraph in the form of trap-door one-way functions, but other variations exist. A trap-door cipher is one which strongly resists cryptanalysis by anyone not in possession of trap-door information used in the design of the cipher.”
- “For example a quasi one-way function is not one-way in that an easily computed inverse exists. However, it is computationally infeasible even for the designer, to find the easily computed inverse. Therefore a quasi one-way function can be used in place of a one-way function with essentially no loss in security.”
quotes
- “We stand today on the brink of a revolution in cryptography”
- “In the nineteen twenties, however, the “one time pad” was invented, and shown to be unbreakable”
- “Any channel may be threatened with eavesdropping or injection or both, depending on its use. In telephone communication, the threat of injection is paramount, since the called party cannot determine which phone is calling. Eavesdropping, which requires the use of a wiretap, is technically more difficult and legally hazardous. In radio, by comparison, the situation is reversed. Eavesdropping is passive and involves no legal hazard, while injection exposes the illegitimate transmitter to discovery and prosecution.”
- “Much as error correcting codes are divided into convolutional and block codes, cryptographic systems can be divided into two broad classes: stream ciphers and block ciphers.”
- “Care must be taken, however, to use a system in which small changes in the ciphertext result in large changes in the deciphered plaintext. This intentional error propagation ensures that if the deliberate injection of noise on the channel changes a message such as ‘erase file 7’ into a different message such as ‘erase file 8,’ it will also corrupt the authentication information. The message will then be rejected as inauthentic.”
- “A chosen plaintext attack is difficult to achieve in practice, but can be approximated. For example, submitting a proposal to a competitor may result in his enciphering it for transmission to his headquarters. A cipher which is secure against a chosen plaintext attack thus frees its users from concern over whether their opponents can plant messages in their system.”
- “The chosen plaintext attack is often called an IFF attack, terminology which descends from its origin in the development of cryptographic ‘identification friend or foe’ systems after World War II. An IFF system enables military radars to distinguish between friendly and enemy planes automatically. The radar sends a time-varying challenge to the airplane which receives the challenge, encrypts it under the appropriate key, and sends it back to the radar. By comparing this response with a correctly encrypted version of the challenge, the radar can recognize a friendly aircraft. While the aircraft are over enemy territory, enemy cryptanalysts can send challenges and examine the encrypted responses in an attempt to determine the authentication key in use, thus mounting a chosen plaintext attack on the system.”
- “We examine two approaches to this problem, called public key cryptosystems and public key distribution systems, respectively. The first are more powerful, lending themselves to the solution of the authentication problems treated in the next section, while the second are much closer to realization.”
- “And it is at least conceptually simpler to obtain an arbitrary pair of inverse matrices than it is to invert a given matrix. Start with the
identity matrix I and do elementary row and column operations to obtain an arbitrary invertible matrix E. Then starting with I do the inverses of these same elementary operations in reverse order to obtain D = E-1. The sequence of elementary operations could be easily determined from a random bit string” - Diffie-Hellman cryptosystem:
“We now suggest a new public key distribution system which has several advantages. First, it requires only one ‘key’ to be exchanged. Second, the cryptanalytic effort appears to grow exponentially in the effort of the legitimate users. And, third, its use can be tied to a public file of user information which serves to authenticate user A to user B and vice versa. By making the public file essentially a read only memory, one personal appearance allows a user to authenticate his identity many times to many users. Merkle’s technique requires A and B to verify each other’s identities through other means.
The new technique makes use of the apparent difficulty of computing logarithms over a finite field GF(q) with a prime number q of elements. Let Y = αx mod q for 1 ≤ X ≤ q-1 where α is a fixed primitive element of GF(q), then X is referred to as the logarithm of Y to the base α, mod q: X = logα Y mod q for 1 ≤ Y ≤ q-1. Calculation of Y from X is easy, taking at most 2×log2(q) multiplications. For example, for X = 18, Y = α18 = (((α2)2)2)2 × α2. Computing X from Y, on the other hand can be much more difficult and, for certain carefully chosen values of q, requires on the order of q1/2 operations, using the best known algorithm” - “It is important to note that we are defining a function which is not invertible from a computational point of view, but whose noninvertibility
is entirely different from that normally encountered in mathematics. A function f is normally called ‘noninvertible’ when the inverse of a point y is not unique, (i.e., there exist distinct points x1 and x2 such that f(x1) = y = f (x2)). We emphasize that this is not the sort of inversion difficulty that is required. Rather, it must be overwhelmingly difficult, given a value y and knowledge of f, to calculate any x whatsoever with the property that f(x) = y.” - “It may be, however, that rapid computation of fn precludes f from being one-way.”
- “For the system to be secure, computation of the key from the keystream must be computationally infeasible. While, for the system to be usable, calculation of the keystream from the key must be computationally simple. Thus a good key generator is, almost by definition, a one-way function.”
- “A public key cryptosystem can be used to generate a one-way authentication system.”
“The converse does not appear to hold, making the construction of a public key cryptosystem a strictly more difficult problem than one-way authentication. Similarly, a public key cryptosystem can be used as a public key distribution system, but not conversely.” - “In consequence of this, judging the worth of new systems has always been a central concern of cryptographers.”
- “As systems whose strength had been so argued were repeatedly broken, the notion of giving mathematical proofs for the security of systems fell into disrepute and was replaced by certification via cryptanalytic assault.”
- “This example demonstrates both the great promise and the considerable shortcomings of contemporary complexity theory. The theory only tells us that the knapsack problem is probably difficult in the worst case. There is no indication of its difficulty for any particular array.”
summary
A historically important document. The paper has a SoK-style content giving a look at contemporary asymmetric cryptography in 1976. I liked the high-level overview and the collection of established vocabulary. I only missed to understand the math related to “kn = 5000” after parameterizing Leslie Lamport's system.
It was very interesting to see his discussion of the relation to complexity theory. In particular, I have never seen this written in such clear words: “As systems whose strength had been so argued were repeatedly broken, the notion of giving mathematical proofs for the security of systems fell into disrepute and was replaced by certification via crypanalytic assault.”
“We stand today on the brink of a revolution in cryptography” is Diffie-Hellman's first statement which is justified by the last sentence of the paper: “We hope this will inspire others to work in this fascinating area in which participation has been discouraged in the recent past by a nearly total government monopoly”.
typo
- page 653, “intergers”
- page 653, “crypanalytic”
Number "Not" Used Once - Key Recovery Fault Attacks on… §
Title: “Number "Not" Used Once - Key Recovery Fault Attacks on LWE Based Lattice Cryptographic Schemes” by Prasanna Ravi, Shivam Bhasin, Anupam Chattopadhyay [url] [dblp]
Published in 2018 at COSADE 2019 and I read it in 2020-05
Abstract: This paper proposes a simple single bit flip fault attack applicable to several LWE (Learning With Errors Problem) based lattice based schemes like KYBER, NEWHOPE, DILITHIUM and FRODO which were submitted as proposals for the NIST call for standardization of post quantum cryptography. We have identified a vulnerability in the usage of nonce, during generation of secret and error components in the key generation procedure. Our fault attack, based on a practical bit flip model (single bit flip to very few bit flips for proposed parameter instantiations) enables us to retrieve the secret key from the public key in a trivial manner. We fault the nonce in order to maliciously use the same nonce to generate both the secret and error components which turns the LWE instance into an exactly defined set of linear equations from which the secret can be trivially solved for using Gaussian elimination.
ambiguity
What is z in the provided interval in section 2.6? Seems to be either an arbitrary integer or z of Algorithm 1 (NewHope)
error
“later converted to the NTT domain using the PolyBitRev function” … no, it is a separate step
inconsistency
- t = A ⋅ s1 + s2 in flow text in section 2.5
- t = A × s1 + s2 in Algorithm 4
quotes
- “Nonces are predominantly used in all the afore mentioned schemes in order to reduce the amount of randomness required to generate the secret and error components used in the generation of the LWE instance during key generation.”
- “main focus on the key generation algorithm, as that is the target of our attack.”
- “We also use x ← Sη to denote the module x whose coefficients lie in the range [−η, η].”
- About Kyber: “Apart from this, there are other modifications to the scheme like the modified technique for generation of the public module A as in [4], compressed public key and ciphertexts through "Bit-dropping" using the Learning-with-rounding (LWR) problem”
- “every coefficient of the recovered t is only a perturbed version of the original t generated during the key-generation procedure KYBER.CPAPKE.GEN(). Thus, the public key can be assumed to be built based on the hardness of both the MLWE and MLWR problem.”
- About Frodo: “The modulus q is chosen to be a power of 2 that enables easy reduction through bit masking.”
- “For example, if the error component is an all zero vector, then the LWE instance is converted into a set of linear equations with equal number of equations and unknowns. This instance can be solved using straight forward Gaussian elimination. If the error component only has values in a fixed interval [z + 1/2, z - 1/2], then one can just "round away" the non-integral part and subtract z to remove the error from every sample [29]. There are also other easy instances of LWE, For eg. From a given set of n LWE instances, if k of the n error components add up to zero, then one can simply add the corresponding samples to cancel the error and obtain an error free sample. It is also possible to solve an LWE instance in roughly nd time and space using nd samples if the error in the samples lie in a known set of size d [5]. For a very small d, this yields a very practical attack.”
- “Thus, if an attacker can perform a single bit flip of the nonce such that both the calls to the function poly_sample use the same seed, then it yields s = e.”
- “Thus, ultimately the attacker has to inject a minimum of just two bit flips in order to create a very easy LWE instance that results in direct retrieval of the secret key S through Gaussian elimination.”
- “In KYBER, the dimensions of both s1 and s2 are the same and equal k. But, the generated MLWE instance t = a × s1 + s2 is further protected by the hardness of the MLWR problem through the Compressq function and hence the public key is formed by a combination of MLWE and MLWR instances.”
- “Since the nonce is deterministically generated, one can use an error correction scheme to check if the correct value of nonce is being used for the generation of polynomials.”
- “The motivation to use a nonce for generation of polynomials is mainly to reduce the randomness requirement and use the same seed to generate multiple polynomials.”
summary
- paper from 2019
- public key in Learning With Errors problem is given by A×s + e
- If s = e, then A×s + s gives a trivial linear equation system to solve to get the secret key s
- if the nonce is the same between the generation of s and e, then s = e
- stuck-at fault attack on nonces in NewHope, Kyber, Dilithium, and Frodo schemes
- trivial for NewHope (keep poly_sample(&shat, noiseseed, 0) the same, apply stuck-at(0) to poly_sample(&ehat, noiseseed, 1))
- Technically more difficult for Frodo, because stuck-at(1) must be applied to Frodo.SampleMatrix(seed_E, n, n_bar, T_chi, 2). Requires two stuck-at bits unlike NewHope
- For Dilithium, l stuck-ats required, but then equation system is only solvable if enough equation (i.e. signatures) are available
- For Kyber, k stuck-ats required, but the equation system must not necessarily be solvable, since rounding still protects it
Short paper, tiny errors, no practical implementation (how successful can the attack on Frodo be implemented?) (correction: unlike the paper, their presentation slides show evaluation data), the structure of the text is beautiful and it is well-written. I particularly enjoyed the summary of weak LWE instances. Masking is also another countermeasure, but is heavy-weight compared to the proposed error-correcting codes
typo
The aforementioned fault vulnerabilities associated with the nonce only occur due to the implementation strategies used and hence can we believe can be easily corrected albeit with a small performance overhead.
⇒ hence can, we believe, can be easily corrected
On the Security of Password Manager Database Formats §
Title: “On the Security of Password Manager Database Formats” by Paolo Gasti, Kasper B. Rasmussen [url] [dblp]
Published in 2012 at ESORICS 2012 and I read it in 2021-05
Abstract: Password managers are critical pieces of software relied upon by users to securely store valuable and sensitive information, from online banking passwords and login credentials to passport- and social security numbers. Surprisingly, there has been very little academic research on the security these applications provide.
question
Why does Definition 2 (IND-CDBA security) include 1/2 + negl(κ) as threshold whereas Definition 3 (MAL-CDBA security) uses negl(κ)?
Game 1 (which Definition 2 relies upon) asks to decide upon one binary value (⇒ 50% with guessing strategy). Game 2 (which Definition 3 relies upon) asks to create a new, valid DB (⇒ 0% probability if guessing is your strategy).
quotes
- “Users typically solve this problem in one of two ways. A common solution is to reuse the same password on many different websites. This approach increases the potential damage if a password is stolen, cracked, or if a service that has access to it is compromised, since the attacker will be able to reuse it on all online services that share the password. Another approach is to use a “password manager” to store strong (random) passwords for each site. A password manager is a piece of software that requires a user to remember a single strong master password, used to decrypt the password manager’s database. Remembering a single master password is much more feasible for users, who still get the security benefits of using a different password for each online service.”
- “As such, users who rely on password managers are less susceptible to typo-squatting and phishing attacks [11,20]: even if a user is directed to a malicious website that is designed to look identical to the website the user expects, the password manager will not log in automatically, providing an extra layer of protection.”
- “Several producers of password managers suggest storing password databases on USB sticks [35, 37, 40], in the cloud [1, 24] or on mobile devices [2, 4, 27], to allow convenient access to stored passwords.”
- “Note that we do not attempt to provide an exhaustive list of all possible attacks on all password managers. Rather, we model the security provided by common password manager database formats and provide examples of practical attacks.”
- “Advr who has read access to the password database, and Advrw who has read-write access. The goal of both adversaries is to extract as much information as possible and, for Advrw, to produce a database that (1) was not created by the user and (2) once opened, will not trigger any warning or error message from the password manager.”
- “Consider an adversary who has full access to an encrypted password database, and is able to record different versions of it. Such an adversary can clearly use any of the recorded versions to replace the current database, as long as the master password did not change.”
- Untrusted storage: “Consider an adversary who has full access to an encrypted password database, and is able to record different versions of it. Such an adversary can clearly use any of the recorded versions to replace the current database, as long as the master password did not change. […] it cannot be mitigated by the database format alone. Therefore we exclude it from our analysis.”
- Firefox: “URLs are always stored unencrypted regardless of whether a master password is used or not.”
- Microsoft Internet Explorer: “The encryption is performed using the CryptProtectData [29] system call, which uses Triple-DES in CBC mode and a hash-based MAC.”
- 1Password:
- “Entries are listed in an index file called ‘content.js’.”
- “The encryption scheme used is AES-128 in CBC mode. Neither the records nor the index file are integrity protected.”
- KDBX4 (aka KeePass 2.x): “bdy is encrypted using AES-256 in CBC mode, although Twofish is also available. The first 32 bytes of bdy contains the encryption of the ssbytes field in order to efficiently verify whether the provided master password is correct.”
summary
Well written paper with a clear agenda. It is perfectly structured and can be followed easily.
The paper defines two notions of security and evaluates password managers against this notion. The notion is well defined, but the practicality is debatable:
- “We argue that MAL-CDBA security (together with IND-CDBA security) is an appropriate security notion for a password manager database format in practice.”
- “We defined two realistic security models, designed to represent the capabilities of real-world attacks.”
I don't think so. Adding entries to the password manager do have a different impact on security than actually replacing existing entries. To retrieve the actual password, the adversary has to set up a typo-squatting domain and than replace the defined URL with the malicious one. Such a scenario is discussed in the paper, was evaluated and is practical, but the security notion does not distinguish between modification and extension. A pure extension attack has a much more limited impact.
I enjoyed the proper discussion of related work to security notions and formal definition of password managers. As of 2021, some results are deprecated (e.g. Internet Explorer does not exist anymore) and some password manager are not maintained anymore (e.g. PINs). Recommendable for everyone to see how the database formats are defined.
- “In the rest of this paper focus solely on database formats and the security they provide, rather than on each password manager implementation. We assume that the password managers themselves correctly implement what the format specifies. As such, we do not consider, e.g., side channel attacks on the cryptographic primitives, or other attacks against the implementation. Rather we investigate the best possible security achievable given a specific storage format.”
- What is typo-squatting? The practice of serving a malicious, similar looking website if you spell the URL/domain name incorrectly.
- password manager = {
Setup: (security parameter) → mp,
Create: (mp, record-set) → DB,
Open: (mp, DB) → RS ‖⊥,
Valid: (mp, DB) → 1 ‖ 0
} - “We also define two new games, which we call indistinguishability of databases game (IND-CDBA) and malleability of chosen database game (MAL-CDBA).”
- IND-CDBA security ⇒ You decide 2 records. I must distinguish records in encrypted form.
Malleability of chosen database game ⇒ I generate records. You encrypt them. I can create a different encrypted file which is valid. - Neat discussion of security notions at page 6 (top) and in the Appendix.
- 2 times “The application is available upon request.” ⇒ Why not open-source?!
- Class 1 (trustworthy): PasswordSafe v3
Class 2 (never rely on information in the database): KDBX4 & PINs
Class 3 (do not use unless you take other measures to ensure data integrity): … others
typo
- page 2, “poplar” → “popular”
- page 3, Table 1, “where” → “were”
On the criteria to be used in decomposing systems into… §
Title: “On the criteria to be used in decomposing systems into modules” by David Lorge Parnas [url] [dblp]
Published in 1972 at CACM, Volume 15, 1972 and I read it in 2021-03
Abstract: This paper discusses modularization as a mechanism for improving the flexibility and comprehensibility of a system while allowing the shortening of its development time. The effectiveness of a "modulariza tion11 is dependent upon the criteria used in dividing the system into modules. Two system design problems are presented, and for each, both a conventional and unconventional decomposition are described. It is shown that the unconventional decompositions have distinct advantages for the goals outlined. The criteria used in arriving at the decomposi tions are discussed. The unconventional decomposition, if implemented with the conventional assumption that a module consists of one or more subroutines, will be less efficient in most cases. An alternative approach to implementation which does not have this effect is sketched.
quotes
- “A well-defined segmentation of the project effort ensures system modularity”
- “The major advancement in the area of modular programming has been the development of coding techniques and assemblers which (1) allow one module to be written with little knowledge of the code in another module, and (2) allow modules to be reassembled and replaced without reassembly of the whole system”
- “Below are several partial system descriptions called modularizations. In this context ‘module’ is considered to be a responsibility assignment rather than a subprogram. The modularizations include the design decisions which must be made before the work on independent modules can begin.”
- “The system is divided into a number of modules with well-defined interfaces; each one is small enough and simple enough to be thoroughly understood and well programmed.”
- “In the first modularization the interfaces between the modules are the fairly complex formats and table organizations described above.”
- “In the second modularization the interfaces are more abstract; they consist primarily in the function names and the numbers and types of the parameters”
- “In the first decomposition the criterion used was to make each major step in the processing a module”
- “The second decomposition was made using ‘information hiding’ as a criterion”
- “In discussions of system structure it is easy to confuse the benefits of a good decomposition with those of a hierarchical structure”
- “We have tried to demonstrate by these examples that it is almost always incorrect to begin the decomposition of a system into modules on the basis of a flowchart. We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others”
summary
First, someone needs to get familiar with KWIC (recognize the paper reference in the section below). KWIC felt like an arbitrary index someone came up with. I got confused by phrases like “the characters are packed four to a word”, which make little sense outside the index context of a book. But after reading the paper, I looked up the Wikipedia article and learned about its usecase (index of keywords before full-text search was available or in print). The paper is considered an ACM Classic and he got high praise for it.
Essentially, first I had to understand the setting when this paper was written. I grew up with data encapsulation in object-oriented programming, local scoping in programming languages and manipulating data behind a pointer was already a foreign, dangerous idea. The setting is the transition of assembler language towards more high-level languages with questions regarding information hiding arising.
In modular design, his double dictum of high cohesion within modules and loose coupling between modules is fundamental to modular design in software. However, in Parnas's seminal 1972 paper On the Criteria to Be Used in Decomposing Systems into Modules, this dictum is expressed in terms of information hiding, and the terms cohesion and coupling are not used. He never used them.
–Wikipedia: David Parnas
I would define a module as a set of functionality (independent of representation in a programming language). High cohesion within modules and loose coupling between modules is a defining criterion for a good programmer. What I consider an open question, but often triggers bugs is the missing documentation for the interface between modules. Often a data structure transfers the data from one module to another. An informal description often triggers different expectations regarding the content of the data structure.
Back to the paper, it illustrates the decomposition of a system by two exemplary modularizations. Whereas the first decomposition was created along the major steps of the processing routines, the second decomposition was created with information hiding in mind. Then several recommendable criteria for decompositions are mentioned:
- A data structure, its internal linkings, accessing procedures and modifying procedures are part of a single module.
- The sequence of instructions necessary to call a given routine and the routine itself are part of the same module
- The formats of control blocks used in queues in operating systems and similar programs must be hidden within a “control block module.”
- Character codes, alphabetic orderings, and similar data should be hidden in a module for greatest flexibility
- The sequence in which certain items will be processed should (as far as practical) be hidden within a single module
In the end, I think the paper advocates a clean style which (in some sense and with some limitations) is still true today (e.g. web frameworks heavily limit the way you can define your architecture). I recommend every programmer to reflect about possible decompositions of a system, because the most intuitive one might not be the best. The notion of a flowchart approach being the sequential one, is however awkward and foreign to me.
PDF/A considered harmful for digital preservation §
Title: “PDF/A considered harmful for digital preservation” by Marco Klindt [url] [dblp]
Published in 2017 at iPRES 2017 and I read it in 2020-12
Abstract: Today, the Portable Document Format (PDF) is the prevalent file format for the exchange of fixed content electronic documents for publication, research, and dissemination work in the academic and cultural heritage domains. Therefore it is not surprising that PDF/A is perceived to be an archival format suitable for digital archiving workflows.
clarification needed
- “PDF reduces the computational burden of the display device by
executing the necessary PostScript programs during the creation
of the PDF file.”- PostScript is presented with computational burden.
- PDF is consequently describes as object store of PostScript elements
- How does this reduce the computational burden?
- I assume that reuse of objects is the answer, but stating as such would be useful
- “Fonts with open licenses like SIL Open Font License 4 circumvent possible restrictions but also complicate conversion due to differences in substitute font dimensions.”
- not an inherent property of OFL fonts?”
errors
“The textual markup of Markdown variants is machine actionable while being human friendly to read at the same time. It is suitable for structured texts (including lists and tables) where the exact layout is not as important. Markdown is not well suited for validation.”
- lists are troublesome as nested lists occur in various flavors and were not considered in its initial design
- tables were not part of the first Markdown design and various contradicting implementations exist
quotes
Interesting:
- “In a quick analysis of institutional repositories hosted at the ZIB, the siegfried file identification tool 1 identified 44,114 or 84% from a total of 52,611 documents as PDF (and 1,168 or 0.03% of these as PDF/A). Other file formats included Word, WordPerfect, PostScript files and a long tail of more obscure document formats.”
- “Digital preservation is primarily concerned with keeping information contained in digital objects or documents usable for future use.”
- “The accepted reference model for digital preservation systems is the Open Archival Information System (ISO 14721:2012, OAIS)[11]”
- “The usage of and commercial success began with the release of the free Acrobat Reader 2.0 in 1996 for PDF 1.1 and licensing all patents royalty free for everyone using its format. It became the de-facto exchange format for electronic documents and version 1.7 was finally standardized by the International Standards Organization as ISO 32000-1[15] in 2008.”
- “Until the 10.1.5 and 11.0.01 updates Adobe Acrobat products have historically opened a PDF as long as the %PDF-header started anywhere within the first 1024 bytes of the file.”
- “To extract information from content in PDF, tags can be attached to PDF objects from version 1.4 onward. These tags act as markup to denote the logical structure (semantic elements), and logical order (flow) of the content.”
- “All content shall be marked in the structure tree with semantically appropriate tags (i.e. headings, formulas, paragraphs and such) in the logical, intended reading order.”
- “PDF also does not provide different perspectives on textual content. Electronic documents may want to provide different views of the text or data, either in multiple languages, diplomatic or critical transcriptions, or from different sources.”
- “Usability issues aside, Willinsky et al.[32] give an excellent overview about current issues with using PDF in the scholarly environment. They hope, that their observations will influence further development of PDF or even the ‘Great PDF Replacement Format (GPDFRF)’.”
- “Converting “normal” PDFs to PDF/A a-level conformance automatically is not advisable as a lot of information may already be lost during the creation process of the document.”
- “But even PDF/A a-level conformance may not guarantee full text recovery due to the fact that some tagging features are only recommendations and not mandatory. Hyphenation (the word division at the end of a line) shall be treated as an incidental artifact and be represented as a unicode soft-hyphen (U+00AD) instead of a hard-hyphen (U+002D) as suggested by the standard.”
- “But even PDF/A a-level conformance may not guarantee full text recovery due to the fact that some tagging features are only recommendations and not mandatory. Hyphenation (the word division at the end of a line) shall be treated as an incidental artifact and be represented as a unicode soft-hyphen (U+00AD) instead of a hard-hyphen (U+002D) as suggested by the standard.”
- “It is alternatively possible to provide the /ActualText attribute without the hyphen.”
- “For some time, the go-to-tool for PDF/A validation was JHOVE 5 using PDF profiles. As it was discovered that it was not suitable for validating PDF/A files[29], the EU funded PREFORMA project 6 included a provision to create veraPDF 7 , a validator which aims at checking conformance of all PDF/A flavors while also allowing for policy checks that are customizable to institutional policy.”
- “Some possible strategies for the better handling of PDFs mostly involve the content producers but also create more involved workflows within the archive:
- Negotiate non-PDF documents better suited for their domain and supported by your archive system.
- Consider using PDF/A as a dissemination format only (and therefore use a PDF rendition server only for access not ingest).
- Save the original source documents alongside the PDFs for full text and structure retention. With PDF/A-3 these could be embedded and linked as source of the document.
- Require data producers to implement workflows that adhere to the Matterhorn protocol to assure fully, meaningful tagged PDFs (including MathML formulas, semantically tagged data and so on) and to provide /ActualText for every textual information contained in the PDF that is not easily extractable otherwise.“
- “WebArchive (WARC) files bundle all necessary components and are already in use in digital archiving.”
Well put:
- “While humans have the ability to recognize the structure of text from layout, which is a necessary requirement for meaningful extraction of information and therefore gaining knowledge from texts and illustrations including diagrams, formulas, and tables, machine-based technology is not yet able to achieve this to the same extent. This makes it difficult for such technology to use or reuse the information contained in PDFs.”
- “Adobe extended the PDF specification multiple times over the years to allow for more features like encryption, transparency, device-independent colors, forms, web-links, javascript, audio, video, 3D objects and many more[18].”
- “PDF supports incremental updates of its content. New objects, a new cross-reference table and a new trailer can be appended to the end of the file, if the content of the PDF is updated, without the need to rewrite the whole file. As objects can be marked as deleted in the xref-table there is no need to delete the corresponding objects in the body section.”
- “PDF can also define different rectangles useful in print like crop boxes, bleed boxes, trim boxes, and art boxes (refer to the PDF reference[7] for additional information).”
- “A standard for required tag usage was published by ISO as ISO 14289[10] known as PDF/UA in 2014 (thus after the publication of PDF/A-2/3). Even though being accessible by AT (i.e. software) is a legal requirement in some domains, creating compliant documents is still a complex and cumbersome endeavor. Even assessing compliance to PDF/UA is quite hard: The Matterhorn protocol[24] provides a testing model that defines 31 checkpoints comprised of 136 failure conditions encompassing file format requirements for AT accessible PDF/UAs of which some are not applicable to PDF/A (e.g. related to javascript). While 87 failure conditions are determinable by software 47 usually require human judgement or assessment. Failure condition 06-003 for example is machine testable and requires the metadata stream to contain a dublincore:title while 06-004 requires that the title clearly identifies the document in respect to human knowledge, a check that obviously is not decidable by algorithms.”
- “Nielsen [23] argued in 2001 that the fixed, page-based layout of PDF is not well suited for on-screen reading in contrast to web pages or other hypertext documents.”
- “If optical character recognition results are available they also are embedded into the PDF as a invisible text layer over the corresponding areas in the image of the original.”
- “An insightful analogue of the difference between human content understanding and machine extraction capabilities would be the visible communication of music. While storing the layout of sheet music is perfectly achievable with PDF the placement of note glyphs on lines with annotating glyphs for bars, clefs and so on, it is easily understood and transformed into audible sound by humans trained in reading musical notation. A machine would have a hard time extracting enough information to reproduce or compare the musical score.”
- “What constitutes a word and finding word boundaries might be difficult by itself depending on the layout or script of the text. Selecting rows or columns from tables in PDF reader applications often also results in frustration.”
- “Searching for the string ”Rheinland” (German for Rhineland, a part of Germany) in the PDF/A-1a file of the nestor newsletter number 28[22] for example would result in no matches in macOS Preview or Adobe Reader as it is stored as a hard-hyphen. The hyphen in ”Ostwestfalen-Lippe” is a regular one.”
- “PDF/A is perceived to be an archival solution for digital documents. Discussion within the community revealed the reason for that is three-fold: Firstly, it is marketed as an archival format. The A in PDF/A might stand for “Archive” or “Archival” or simply for the letter “A”; I haven’t found any official explanation for the choice of A in the acronym. The second reason may be that it is used by so many institutions to a point where a critical mass is reached. They cannot altogether err in their risk assessment, so the reasoning is that you simply cannot be wrong when you run with the flock. And thirdly, there does not seem to be a better alternative available (see below).”
- “Pages are useful for citation in the traditional format of books or journals but with the advancement of digital publishing and linked data technologies it will be more useful to refer to information sets identified (and locatable) by persistent digital identifiers like URIs or IRIs.”
- “One teenager even wondered why YouTube isn’t mentioned in a book from 2005.”
- “In contrast to XHTML, an XML language, it is very robust to formal errors.”
- “Sullivan reports in her 2003 article (emphasis added): ‘The intent was not to claim that PDF-based solutions are the best way to preserve electronic documents. PDF/A simply defines an archival profile of PDF that is more amenable to long-term preservation than traditional PDF.’[27]”
summary
This paper summarizes some shortcomings why/how PDF/A is not the final solution for long-term archival of documents.
PDF features not available in PDF/A (list from external resource):
- Embedded video and audio files
- encryption
- transparency
- Javascript
- executable files
- functions
- external links
- layers
I would summarize those arguments as lack of annotations in the PDF to make PDFs accessible for machines and people with visual disabilities. This makes it difficult to index and retrieve document information as part of a large corpus (→ database). The problems boil down to the design of PDF, which allows multiple ways of encoding text information, which might loose e.g. reading order information. Refer to Table 2, for example. In the end, the tooling support is not great, but a lack of alternatives can be identified. However, if we consider information extraction capabilities as more important than (e.g. font embedding and reproducible layout), some alternatives are mentioned in section 5.1 (e.g. HTML/CSS or ODF/OOXML). One intriguing analogy is given in the paper: Musical typesetting can be done with PDF as output, but retrieving information about the music is very difficult.
In the conclusion the author admits that the PDF/A authors were aware of its shortcomings: “The intent was not to claim that PDF-based solutions are the best way to preserve electronic documents. PDF/A simply defines an archival profile of PDF that is more amenable to long-term preservation than traditional PDF”
In the end, the paper is a summary which does not provide any solution as pointed out by the author. As a critic of Markdown, I am saddened to see that Markdown was even mentioned (but other markup languages are neglected).
Piret and Quisquater’s DFA on AES Revisited §
Title: “Piret and Quisquater’s DFA on AES Revisited” by Christophe Giraud, Adrian Thillard [url] [dblp]
Published in 2010 at and I read it in 2020-06
Abstract: At CHES 2003, Piret and Quisquater published a very efficient DFA on AES which has served as a basis for many variants published afterwards. In this paper, we revisit P&Q’s DFA on AES and we explain how this attack can be much more efficient than originally claimed. In particular, we show that only 2 (resp. 3) faulty ciphertexts allow an attacker to efficiently recover the key in the case of AES-192 (resp. AES-256). Our attack on AES-256 is the most efficient attack on this key length published so far.
quotes
- “we show that only 2 (resp. 3) faulty ciphertexts allow an attacker to efficiently recover the key in the case of AES-192 (resp. AES-256).”
- “Since its publication in 1996, Fault Analysis has become the most efficient way to attack cryptosystems implemented on embedded devices such as smart cards. In October 2000, Rijndael was selected as AES and since then many researchers have studied this algorithm in order to find very efficient differential fault attacks. Amongst the dozen DFA on AES published so far, Piret and Quisquater’s attack published at CHES 2003 [5] is now a reference which has been used as a basis for several variants published afterwards.”
- “Therefore the last round key can be recovered by using 8 faulty ciphertexts with faults induced at chosen locations.”
- “From our experiments, an exhaustive search on AES-128 amongst 234 key candidates takes about 8 minutes on average on a 4-core 3.2Ghz Xeon by using a non-optimised C code. Therefore such an attack is practical.”
- “The triviality of the extension of Piret and Quisquater’s attack comes from the fact that, since MixColumns is linear, one can rewrite the last two rounds of the AES as depicted in Fig. 3.” → Figure 3 moves MixColumns into the last round and uses key MC-1(Kr-1) for the second-to-last AddRoundKey operation
- “To conclude, the original P&Q’s DFA on AES can uniquely identify the AES key in the 192 and 256-bit cases by using 4 faulty ciphertexts.”
- “From our experiments, an exhaustive search on AES-192 amongst 242 key candidates takes about 1.5 day on average on a 4-core 3.2Ghz Xeon by using a non-optimised C code. Therefore such an attack can be classified as practical.”
- “To conclude, one can reduce the number of candidates for the AES-192 key to 210 by using 3 faulty ciphertexts.”
summary
Good paper. Ideas are immediate. Not all attacks presented give runtimes, but the (more important) number of faulty ciphertexts is analyzed properly. The original P&Q attack is neat in general.
The original attack from 2003 uniquely identifies the key with 2 faulty ciphertext pairs with probability 98%.
typo
Figure 2 swaps the last SubBytes and ShiftRows operation
Power analysis attack on Kyber §
Title: “Power analysis attack on Kyber” by Alexandre Karlov, Natacha Linard de Guertechin [url] [dblp]
Published in 2021-09 at and I read it in 2021-09
Abstract: This paper describes a practical side-channel power analysis on CRYSTALSKyber key-encapsulation mechanism. In particular, we analyse the polynomial multiplication in the decapsulation phase to recover the secret key in a semi-static setting. The power analysis attack was performed against the KYBER512 implementation from pqm4 [1] running on STM32F3 M4cortex CPU.
errors
- “Then the set of byte array of length k” → “Then the set of byte arrays of length k”
- “The goal of the attack is to recover the secret key during the step 3 of the key decapsulation, specifically at the line 1 in Kyber.CPAPKE.Dec(sk, c), which is a multiplication of two polynomials in the NTT domain.” → “The goal of the attack is to recover the secret key during the step 3 of the key decapsulation, specifically at the line 1 in indcpa_dec(sk, c), which is a multiplication of two polynomials in the NTT domain.”
- “In a semi-static setting, Bob computes the multiplication of the secret fixed polynomial with different ciphertexts in line 1 of Algorithm 3.” → “In a semi-static setting, Bob computes the multiplication of the secret fixed polynomial with different ciphertexts in line 1 of Algorithm 7.”
- “The goal of the CPA on KYBER is to recover Bob secret key.” → “The goal of the CPA on KYBER is to recover Bob's secret key.”
- “In a semi-static setting, the attacker intercepts N Alice’s ciphertexts and measures the power consumption for the basemul operation.” → “In a semi-static setting, the attacker intercepts N of Alice’s ciphertexts and measures the power consumption for the basemul operation.”
- “alignement of traces” → “alignement of traces”
- “One simple and straightforward countermeasure consists in avoiding using Kyber in a semi-static setting.” → “One simple and straightforward countermeasure consists of avoiding using Kyber in a semi-static setting.”
quotes
- “In this paper, we proposed a successful and practical correlation power analysis attack on KYBER512 implementation from pqm4 to recover the secret key in the decapsulation phase. With a sufficient number of traces, the attack is > 99% successful and accurate.”
summary
Idea is immediate (CPA on multiplication in decryption step, impl by pqm4, ChipWhisperer Pro with CW308 UFO and STM32F3). The scientific contribution is the evaluation data. A number of 200 traces suffices which is nice. The paper writing is very preliminary (e.g. definition of B in Bη missing, definition of semi-static setting, etc).
Practical CCA2-Secure and Masked Ring-LWE Implementati… §
Title: “Practical CCA2-Secure and Masked Ring-LWE Implementation” by Tobias Oder, Tobias Schneider, Thomas Pöppelmann [url] [dblp]
Published in 2018 at TCHES 2018 and I read it in 2020-10
Abstract: During the last years public-key encryption schemes based on the hardness of ring-LWE have gained significant popularity. For real-world security applications assuming strong adversary models, a number of practical issues still need to be addressed. In this work we thus present an instance of ring-LWE encryption that is protected against active attacks (i.e., adaptive chosen-ciphertext attacks) and equipped with countermeasures against side-channel analysis. Our solution is based on a postquantum variant of the Fujisaki-Okamoto (FO) transform combined with provably secure first-order masking. To protect the key and message during decryption, we developed a masked binomial sampler that secures the re-encryption process required by FO. Our work shows that CCA2-secured RLWE-based encryption can be achieved with reasonable performance on constrained devices but also stresses that the required transformation and handling of decryption errors implies a performance overhead that has been overlooked by the community so far. With parameters providing 233 bits of quantum security, our implementation requires 4,176,684 cycles for encryption and 25,640,380 cycles for decryption with masking and hiding countermeasures on a Cortex-M4F. The first-order security of our masked implementation is also practically verified using the non-specific t-test evaluation methodology.
quotes
- “With parameters providing 233 bits of quantum security, our implementation requires 4,176,684 cycles for encryption and 25,640,380 cycles for decryption with masking and hiding countermeasures on a Cortex-M4F. The first-order security of our masked implementation is also practically verified using the non-specific t-test evaluation methodology.”
- “In this context, a basic semantically secure encryption scheme with parameters leading to a negligible
amount of decryption errors is a requirement to achieve CCA2-security as discussed by Dwork, Naor, and Reingold [DNR04] when applying CCA2-transformations.” - “The importance of CCA2-security is also reflected in the NIST submission requirements for post-quantum public-key encryption and key-exchange [NIS16] that explicitly ask to declare whether CCA2-security is achieved (see [NIS17] for the list of submissions).”
- “Our main contribution is a novel, provably first-order secured masking scheme and its non-trivial integration into a CCA2 conversion.”
- “With masking and hiding countermeasures our code achieves 2,669,559 cycles for key generation, 4,176,684 cycles for encryption, and 25,640,380 cycles for decryption.”
- “In comparison, our masking scheme thus outperforms previous masking approaches for ring-LWE by one
million cycles.” (NOTE this is 97% of the original runtime) - “In the scheme all elements are polynomials over Rq = ℤq[x]/<xn + 1> where we always assume implicit reduction modulo q and reduction modulo x n +1 and only allow parameters for which it holds that 1 ≡ q mod 2n for q being a prime and n being a power-of-two.”
- “Otherwise, the large term ae1r2 cannot be eliminated when computing c1 r2 + c2. An encoding of the n-bit message m is necessary as some small noise (i.e., e = e1 r1 + e2 r2 + e3) is still present after calculating c1 r2 + c2 and would prohibit the retrieval of the message after decryption. This also shows why the noise distribution is chosen to be rather small – a too big noise level would make reliable decoding impractical.”
- “Masking schemes for the ring-LWE encryption scheme have already been investigated by Reparaz, Roy, Vercauteren, and Verbauwhede in [RRVV15a, RRdC + 16]. The main idea of [RRVV15a, RRdC + 16] is to split the secret key r2 into two shares, compute the multiplication r2 · c1 separately on both shares and add c2 to one of the shares. The authors construct a masked decoder that takes both shares as input and checks whether certain pre-defined rules are satisfied or not. For half of all inputs no rule applies and the value cannot be decoded immediately. This is solved by adding a certain δ ∈ [0, q − 1] to the shares and restarting the decoding process up to 16 times. However, this process increases the decryption time and also the decryption error probability is increased by 19%, which has to be compensated by selecting lower noise sizes and thus leads to lower security.”
- “In follow-up work Reparaz, de Clercq, Roy, Vercauteren, and Verbauwhede [RdCR + 16] propose a different masking scheme. The authors exploit that the ring-LWE decryption is almost additively homomorphic. […] Note that this procedure includes an additional encryption of m'' during the decryption. Unfortunately, the addition of two ciphertexts implies that also the including error vectors are added and this again raises the decryption error probability of the scheme and lowers performance.”
- “Thus, we draw two conclusions for the implementation of practically secured ring-LWE encryption:
- Assuming a CPA-only attacker, the DPA attack on ring-LWE without masked decoding is impractical and thus no masked decoder is required.
- Assuming a CCA2 attacker, a CCA2-conversion has to be applied to ring-LWE. Otherwise, an attacker would be able to break the system without performing a DPA and thus rendering any side-channel countermeasures useless. The message m must not be stored unmasked in this setting.”
- “”
summary
iae
typo
- “independently as \sum_{i=0}^{k-1} b_i - b'_i where” → “independently as \sum_{i=0}^{k-1} (b_i - b'_i) where”
- “These countermeasures are addition of a random” → “These countermeasures are addition of a random”
Practical Evaluation of Masking for NTRUEncrypt on ARM… §
Title: “Practical Evaluation of Masking for NTRUEncrypt on ARM Cortex-M4” by Thomas Schamberger, Oliver Mischke, Johanna Sepulveda [url] [dblp]
Published in 2019 at COSADE 2019 and I read it in 2021-10
Abstract: To protect against the future threat of large scale quantum computing, cryptographic schemes that are considered appropriately secure against known quantum algorithms have gained in popularity and are currently in the process of standardization by NIST. One of the more promising so-called post-quantum schemes is NTRUEncrypt, which withstood scrutiny from the scientific community for over 20 years.
quotes
- “With the use of SIMD instructions available in the Cortex-M4 microcontroller, we are able to implement additive masking without any significant performance overhead compared to an unmasked implementation.”
- “The main side-channel attack against NTRUEncrypt is the correlation power analysis attack (CPA) published in [10].”
- “We adapt the CPA of [10] for modern parameter sets that make use of so called trinary polynomials and show successful attack results. For this we change the multiplication algorithm in order to utilize the sparse structure of trinary polynomials.”
- “In a final step we show that a combination of the Random key rotation shuffling countermeasure [13] with our masked implementation provides a secured implementation against second-order attacks using two million traces on our setup.”
- “As the NTRUEncrypt algorithm has evolved over time, two different kinds of parameter sets were proposed. Their main difference is the choice of the modulo parameter p.”
- “In order to provide CCA-2 security the authors instantiate NTRU with the NAEP encryption scheme as described in [7].”
- “We limit the description of the algorithm to the decryption function, as this function is the only point during the algorithm where a known input, namely the ciphertext e, is combined with the secret key polynomial f , which is a necessary condition to mount side-channel attacks.”
- “Multiplication of two polynomials is performed with the circular convolution
product in the corresponding ring. In [5] this product of two polynomials a(x) ∗
b(x) is defined as: \[ a(x) \otimes b(x) = \sum_{k=0}^{N-1} \left(\sum_{i+j\equiv k \pmod{N}} a_i b_j\right) x^k \] In other words, Eq. (3) can be seen as the multiplication of two polynomials with an additional reduction of the result by (x N − 1) through polynomial long division.” - “It has to be noted that neither the standardized version in IEEE-1363.1 [8] nor the NIST submission of NTRUEncrypt [14] defines a specific way of implementing the multiplication.”
- “In [1] the authors propose Algorithm 2 for the multiplication of a polynomial in R q and a binary polynomial B(d). With this algorithm the authors substitute the multiplication of coefficients with additions based on the index of ones in the binary polynomial. As a binary polynomial is build to be sparse, the coefficients with the value zero can be skipped resulting in a lower number of additions to execute and therefore a faster multiplication.”
- “With this attack the authors target the multiplication of the private key f with the ciphertext e, as this is the only operation on the private key with an attacker controllable input.”
- “In addition to their attack the authors of [10] propose three different countermeasures:
- Random initialization of t: The temporary result array t is initialized with different random values r i , which can help during the first register overwrite in a HD scenario.
- Masking of ciphertext e: With this countermeasure each individual coefficient e i is masked with a random value through modular addition. We give a detailed evaluation of this countermeasure in this work.
- Shuffling: The sequence of all d addition rounds can be shuffled randomly, as the order has no impact on the final result. In theory shuffling countermeasures can be defeated with an increased amount of traces, therefore the authors propose this countermeasure only in combination with masking.”
- “More recently a countermeasure named Random key rotation is proposed in [13].”
- “In accordance to [10] we use arithmetic masking with different masks on all coefficients of the ciphertext polynomial e.”
- “We provide two different masked implementations with the use of ARM assembly code.”
- “The parallel implementation of the masking countermeasure makes use of SIMD instructions of the DSP extension of an ARM Cortex-M4 architecture.”
- “The implicit reduction modulo 216 does not change the result of the multiplication as all coefficients of the result are reduced modulo 211, with q = 2048 for modern parameter sets.”
- “All attacks are performed with power measurements of a STM32F303RCT7 ARM Cortex-M4 microcontroller mounted on the NewAE CW308 UFO board.”
- “There is no significant correlation visible for the first-order attack using up to two million trace measurements. In contrast the second-order attack is successful with two hundred thousand traces.”
- “It can be seen that a second-order attack is less effective on the parallel implementation and therefore we recommend this implementation as it also shows an reduced execution time in comparison with the sequential one.”
- “The shuffling method works by generating a random integer i in the range 0 ≤ i < N − 1 and circular shifting the coefficients of f to the right by i positions.”
summary
A good read and average academic paper.
This paper a masking scheme for NIST submission round 1 NTRUEncrypt variants NTRU-443 and NTRU-743. They discuss 4 proposed countermeasures (1 of them is masking) and conclude that two of the countermeasures should only be applied in combination with masking. They go on to mask polynomial multiplication per coefficient in the decryption as m = (f * (e + masks) - (f * masks)) = f * e. Then they evaluate the countermeasure on a STM32F303RCT7 ARM Cortex-M4 microcontroller mounted on the NewAE CW308 UFO board.
I think they could have skipped trivial figures 4 and 5 and instead argue in more detail why the masking only has to be applied to the multiplication but not other parts.
- convolution polynomial ring = ring with polynomials as elements and convolution as multiplication operation
- Algorithm 2 shows only “+ bk” instead of “+ ai bk” because ai = 1
- NIST submission round 1 NTRUEncrypt features NTRU-443, NTRU-743 and NTRU-1024 where the last one is unsupported in this scheme due to SIMD and a non-power-to-two parameter q.
Region-Based Memory Management in Cyclone §
Title: “Region-Based Memory Management in Cyclone” by Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, James Cheney [url] [dblp]
Published in 2002-06 at LPDI 2002 and I read it in 2023-05-14
Abstract: Cyclone is a type-safe programming language derived from C. The primary design goal of Cyclone is to let programmers control data representation and memory management without sacrificing type-safety. In this paper, we focus on the region-based memory management of Cyclone and its static typing discipline. The design incorporates several advancements, including support for region subtyping and a coherent integration with stack allocation and a garbage collector. To support separate compilation, Cyclone requires programmers to write some explicit region annotations, but a combination of default annotations, local type inference, and a novel treatment of region effects reduces this burden. As a result, we integrate C idioms in a region-based framework. In our experience, porting legacy C to Cyclone has required altering about 8% of the code; of the changes, only 6% (of the 8%) were region annotations.
quotes
- “Cyclone is a type-safe programming language derived from C. The primary design goal of Cyclone is to let programmers control data representation and memory management without sacrificing type-safety. In this paper, we focus on the region-based memory management of Cyclone and its static typing discipline.” (Grossman et al., p. 1)
- “to reduce the need for type casts, Cyclone has features like parametric polymorphism, subtyping, and tagged unions. To prevent bounds violations without making hidden data-representation changes, Cyclone has a variety of pointer types with different compile-time invariants and associated run-time checks.” (Grossman et al., p. 1)
- “Following the seminal work of Tofte and Talpin [28], the system is region-based : each object lives in one region and, with the exception that a distinguished heap region may be garbage collected, a region’s objects are all deallocated simultaneously.” (Grossman et al., p. 2)
- “In our experience, porting C code has required altering about 8% of the code, and the vast majority of changes have not been region annotations.” (Grossman et al., p. 2)
- “Dynamic regions are created with the construct region r{s},wherer is an identifier and s is a statement. The region’s lifetime is the execution of s.Ins, r is bound to aregionhandle, which primitives rmalloc and rnew use to allocate objects into the associated region. For example, rnew(r) 3 returns a pointer to an int allocated in the region of handle r and initialized to 3.” (Grossman et al., p. 2)
- “Like a declaration block, a dynamic region is deallocated precisely when execution leaves the body of the enclosed statement.” (Grossman et al., p. 2)
- “Pointers to arrays of unknown size (denoted τ ?) are implemented with extra fields to support bounds-checks, but this design is orthogonal to regions.” (Grossman et al., p. 2)
- “int*ρ describes a pointer to an int that is in the region whose name is ρ.” (Grossman et al., p. 2)
- “A handle for a region corresponding to ρ has the type region_t<ρ>.” (Grossman et al., p. 3)
- “A block labeled L (e.g., L:{int x=0;s}) has name ρL and refers to the stack region that the block creates.” (Grossman et al., p. 3)
- “We can now give types to some small examples. If e1 has type region_t<ρ> and e2 has type τ ,thenrnew (e1) e2 has type τ *ρ” (Grossman et al., p. 3)
- “Preventing dangling-pointer dereferences. To dereference a pointer, safety demands that its region be live.” (Grossman et al., p. 3)
- “Functions in Cyclone are region polymorphic; they can abstract the actual regions of their arguments or results. That way, functions can manipulate pointers regardless of whether they point into the stack, the heap, or a dynamic region.” (Grossman et al., p. 3)
- “The ? is Cyclone notation for a pointer to a dynamically sized array.” (Grossman et al., p. 3)
- “Because struct definitions can contain pointers, Cyclone allows these definitions to be parameterized by region names.” (Grossman et al., p. 3)
- “To solve this problem, we observe that if the region corresponding to ρ1 outlives the region corresponding to ρ2, then it is sound to use a value of type τ *ρ1 whereweexpect one of type τ *ρ2. Cyclone supports such coercions implicitly. The last-in-first-out region discipline makes such outlives relationships common: when we create a region, we know every region currently alive will outlive it.” (Grossman et al., p. 4)
- “To ensure soundness, we do not allow casting τ1*ρ to τ2*ρ, even if τ1 is a subtype of τ2, as this cast would allow putting a τ2 in a location where other code expects a τ1.(Thisproblem is the usual one with covariant subtyping on references.)” (Grossman et al., p. 4)
- “We emphasize that our approach to inference is purely intraprocedural and that prototypes for functions are never inferred. Rather, we use a default completion of partial prototypes to minimize region annotations. This approach permits separate compilation.” (Grossman et al., p. 4)
- “the compiler deduces an appropriate annotation based on context: 1. For local declarations, a unification-based inference engine infers the annotation from the declaration’s (intraprocedural) uses. This local inference works well in practice, especially when declarations have initializers. 2. Omitted region names in argument types are filled in with fresh region names that are generalized implicitly. So by default, functions are region polymorphic without any region equalities. 3. In all other contexts (return types, globals, type definitions), omitted region names are filled in with ρH (i.e., the heap). This default works well for global variables and for functions that return heap-allocated results. However, it fails for functions like strcpy that return one of their parameters. Without looking at the function body, we cannot determine which parameter (or component of a parameter) the function might return.” (Grossman et al., p. 4)
- “Cyclone does not have closures, but it has other typing constructs that hide regions. In particular, Cyclone provides existential types [22, 14], which suffice to encode closures [21] and simple forms of objects [5]. Therefore, it is possible in Cyclone for pointers to escape the scope of their regions.” (Grossman et al., p. 5)
- “To address this problem, the Cyclone type system keeps track of the subset of region names that are considered live at each control-flow point. Following Walker, Crary, and Morrisett [29], we call the set of live regions the capability. To allow dereferencing a pointer, the type system ensures that the associated region name is in the capability. Similarly, to allow a function call, Cyclone ensures that regions the function might access are all live. To this end, function types carry an effect that records the set of regions the function might access.” (Grossman et al., p. 5)
- “The second major departure from TT is that we do not have effect variables.” (Grossman et al., p. 5)
- “To simplify the system while retaining the benefit of effect variables, we use a type operator, regions_of(τ ).This novel operator is just part of the type system; it does not existatruntime. Intuitively,regions_of(τ ) represents the set of regions that occur free in τ” (Grossman et al., p. 5)
- “For typ e variables, regions_of(α) is treated as an abstract set of region variables, much like effect variables. For example, regions_of(α*ρ) = {ρ} ∪ regions_of(α)” (Grossman et al., p. 6)
- “Cyclone supports existential types, which allow programmers to encode closures.” (Grossman et al., p. 6)
- “In a separate technical report [15], we have defined an operational model of Core Cyclone, formalized the type system, and proven type soundness.” (Grossman et al., p. 6)
- “The code generator ensures that regions are deallocated even when their lifetimes end due to unstructured control flow.” (Grossman et al., p. 8)
- “In this fashion, we ensure that a region is always deallocated when control returns.” (Grossman et al., p. 8)
- “We took two approaches to porting. First, we changed all the programs as little as possible to make them correct Cyclone programs. Then, for cfrac and mini_httpd,we regionized the code: We made functions more region polymorphic and, where possible, eliminated heap allocation in favor of dynamic region allocation with rnew. We also added compiler-checked “not null” annotations to pointer types where possible to avoid some null checks.” (Grossman et al., p. 8)
- “There are two interesting results regarding the difficulty of minimal porting. First, the overall changes in the programs are relatively small — less than 10% of the program code needed to be changed. The vast majority of the differences arise from pointer-syntax alterations. These changes are typically easy to make — e.g., the type of strings are changed from char * to char ?. We are currently experimenting with interpreting char * as a safe null-terminated string type by default; doing so allows many fewer changes. The most encouraging result is that the number of region annotations is small: only 124 changes (which account for roughly 6% of the total changes) in more than 18,000 lines of code. The majority of these changes were completely trivial, e.g., many programs required adding ρH annotations to argv so that arguments could be stored in global variables.” (Grossman et al., p. 9)
- “The cost of porting a program to use dynamic regions was also reasonable; in this case roughly 13% of the total differences were region-related.” (Grossman et al., p. 9)
- “For the non-web benchmarks (and some of the web benchmarks) the median and mean were essentially identical, and the standard deviation was at most 2% of the mean. The factor columns for the Cyclone programs show the slowdown factor relative to the C versions.” (Grossman et al., p. 10)
- “we pay a substantial penalty for compute-intensive benchmarks; the worst is grobner, which is almost a factor of three slower than the C version.” (Grossman et al., p. 10)
- “As Table 3 demonstrates, bounds-checks are also an important component of the overhead, but less than we expected. We found that a major cost is due to the representation of fat pointers. A fat pointer is represented with three words: the base address, the bounds address, and the current pointer location (essentially the same representation used by McGary’s bounded pointers [20]).” (Grossman et al., p. 10)
- “Many systems, including but certainly not limited to LCLint [10, 9], SLAM [3], Safe-C [2], and CCured [25], aim to make C code safe.” (Grossman et al., p. 10)
- “As they note, region-based programming in C is an old idea; they contribute language support for efficient reference counting to detect if a region is deallocated while there remain pointers to it (that are not within it). This dynamic system has no apriorirestrictions on regions’ lifetimes and a pointer can point anywhere, so the RC approach can encode more memory-management idioms.” (Grossman et al., p. 11)
- “Instead, we are currently developing a traditional intraprocedural flow analysis to track region aliasing and region lifetimes.” (Grossman et al., p. 11)
summary
This paper introduces region-based memory management following the seminal work of Tofte and Talpin for the C programming language. A compiler modification is written that takes regions into account to eliminate dereferencing of dangling pointers as compile-time error, memory management through deallocation once the region of the enclosed body statements goes out of scope, and run-time bounds checking with fat pointers for arbitrary-size arrays.
The paper describes the extended type system, notions of subtyping, polymorphism, and “outliving” for regions, revises a soundness proof (provided fully in a technical report), and inference techniques to avoid most region declarations. As a result, only 8% of the code had to changed. Performance has often improved (due to negligible checks), but run-time checks worsen the performance of CPU-intense tasks.
SEVurity: No Security Without Integrity §
Title: “SEVurity: No Security Without Integrity” by Luca Wilke, Jan Wichelmann, Mathias Morbitzer, Thomas Eisenbarth [url] [dblp]
Published in 2020 at IEEE Symposium on Security and Privacy 2020 and I read it in 2020-06
Abstract: One reason for not adopting cloud services is the required trust in the cloud provider: As they control the hypervisor, any data processed in the system is accessible to them. Full memory encryption for Virtual Machines (VM) protects against curious cloud providers as well as otherwise compromised hypervisors. AMD Secure Encrypted Virtualization (SEV) is the most prevalent hardware-based full memory encryption for VMs. Its newest extension, SEV-ES, also protects the entire VM state during context switches, aiming to ensure that the host neither learns anything about the data that is processed inside the VM, nor is able to modify its execution state. Several previous works have analyzed the security of SEV and have shown that, by controlling I/O, it is possible to exfiltrate data or even gain control over the VM’s execution. In this work, we introduce two new methods that allow us to inject arbitrary code into SEV-ES secured virtual machines. Due to the lack of proper integrity protection, it is sufficient to reuse existing ciphertext to build a high-speed encryption oracle. As a result, our attack no longer depends on control over the I/O, which is needed by prior attacks. As I/O manipulation is highly detectable, our attacks are stealthier. In addition, we reverse-engineer the previously unknown, improved Xor-Encrypt-Xor (XEX) based encryption mode, that AMD is using on updated processors, and show, for the first time, how it can be overcome by our new attacks.
Intel SGX's role
Intel Software Guard Extensions (SGX) was the first widely available solution for protecting data in RAM. However, it only can protect a small chunk of RAM, not the VM as a whole.
Memory encryption systems for VMs
- AMD Secure Memory Encryption (SME) (2016): drop-in, AES-based RAM encryption. Controlled by Secure Processor (SP) co-processor. A special bit in the page table – the so-called C-bit – is used to indicate whether a page should be encrypted. The page table in the guest is encrypted and thus not accessible by the hypervisor.
- AMD Secure Encrypted Virtualization (SEV): SEV extends SME for VMs by using different encryption keys per VM, in order to prohibit the hypervisor from inspecting the VM’s main memory
- SEV Encrypted State (SEV-ES)
- Intel Total Memory Encryption (TME)
- Intel Multi-Key Total Memory Encryption (MKTME)
Comparison: S. Mofrad, F. Zhang, S. Lu, and W. Shi, “A comparison study of intel sgx and amd memory encryption technology,”, 2018
Notes
- Why is the tweak value only defined for index ≥4? “We denote the first one as t 4 , since there are no dedicated constants for the least significant bits 3 to 0.” seems to be a resulting condition, not a reason.
- Is the linear equation system resulting from Dec_K(Enc_N(…), q_j) a linear equation system? XOR with m makes it affine linear, doesn't it? Is bit(i, p) linear? I don't think so unless we are in ℤ_2?!
- See Table 1. They used 4-byte repeating patterns for the tweak constants.
- 0x7413 is “jump if equal” on x86_64, but 0xEB13 is “jump unconditionally”
- page 8: 16 MB = 16 777 216 bytes = 134 217 728 bits.
bytes = 16 * 1024**2
bits = 8 * bytes
(bytes/seconds) / 1024**2 # Mbytes/sec
⇒ 0.4266666666666667
(bits/seconds) / 1024**2 # MBits/sec
⇒ 3.4133333333333336 # ok, this occurs in the paper
(bytes/seconds) / 1024 # Kbytes/sec
⇒ 436.9066666666667 # this diverges from the paper with “426.67 KB/s” - “The first mechanism utilizes the cpuid instruction, which is emulated by the hypervisor and features a 2-byte opcode: Each time cpuid is executed, the hypervisor is called to emulate it.”
On virtual memory with virtual machines
On virtualized systems, two different page tables are used. Within the VM, the VA used by the guest, the Guest Virtual Address (GVA), is translated to the Guest Physical Address (GPA). The GPA is the address which the VM considers to be the PA. However, on the host system itself another page table is introduced, to allow multiple VMs to run on the same physical machine. This second page table is called the Second Level Address Translation (SLAT), or Nested Page Table (NPT) [5]. The NPT translates the GPA into the Host Physical Address (HPA), the actual address of the data in physical memory.
summary
- Point of this paper: AMD SEV lacks integrity protection and thus memory encryption of a virtual machine can be broken by an untrusted hypervisor. Hypervisor can thus modify memory content (i.e. data or instructions) of a VM arbitrarily.
- The researchers determined tweak values T(p) using a linear equation system. The AMD Epyc Embedded processor line was considered. AMD Epyc 7251 (released June 2017) used the XE encryption mode whereas AMD Epyc (released Feb 2018) uses the XEX encryption mode
- Xor-Encrypt (XE): Enc_K(m, p) := AES_K(m ⊕ T(p)) and Dec_K(c, p) := AES⁻¹_K(c) ⊕ T(p)
- Xor-Encrypt-Xor (XEX): Enc_K(m, p) := AES_K(m ⊕ T(p)) ⊕ T(p) and Dec_K(c, p) := AES⁻¹_K(c ⊕ T(p)) ⊕ T(p)
- We never know the key K. Thus we cannot decrypt text and encrypt it again. But since we know T(p) where p is chosen to be physical address bit (i.e. address of a 16-byte block), then moving blocks changes the ciphertext in a predictable way
- XE: Dec_K(Enc_K(m, p) , q) = AES⁻¹_K(AES_K(m ⊕ T(p))) ⊕ T (q_j) = m ⊕ T(p) ⊕ T(q_j) = m ⊕ T (p ⊕ q_j)
- When can we make changes in the memory? Whenever control from the VM is handed back from the VM to the hypervisor.
- We can simply remove the executable bit from the page and a page fault will be triggered. Then control will be returned back. Then we also know the address p. Adding the executable bit again and returning control allows to continue instructions stepwise (stepwise execution is a common SGX attack strategy, but unimportant for us)
- Or we can inject the cpuid instruction. The hypervisor is responsible for emulating/answering cpuid by copying the return values to a predefined memory segment.
- So we can move pages without screwing up the encryption. But how can we actually modify memory content? We define an encryption oracle:
- At boot time, cpuid is issued. We filp bits and create a loop to repeatedly call cpuid.
- The hypervisor provides content as return value of cpuid. The kernel in the VM encrypts it. Thus the hypervisor can read the encrypted version of the plaintext it provided.
Trust in cloud providers
One reason for not adopting cloud services is the required trust in the cloud provider: As they control the hypervisor, any data processed in the system is accessible to them.
Tweakable block ciphers
One popular method for storage encryption are tweakable block ciphers, such as AES-XTS [1], which is, e.g., used in Apple’s FileVault, MS Bitlocker and Android’s file-based encryption. Tweakable block ciphers provide encryption without data expansion as well as some protection against plaintext manipulation. A tweak allows to securely change the behavior of the block cipher, similar to instantiating it with a new key, but with little overhead.
XEX and Xor-Encrypt (XE) are methods to turn a block cipher such as AES into a tweakable blockcipher, where a tweak-derived value is XORed with the plaintext before encryption (and XORed again after encryption in the case of XEX).
typo
“before it continuous its operation” → “before it continues its operation.”
SOK: On the Analysis of Web Browser Security §
Title: “SOK: On the Analysis of Web Browser Security” by Jungwon Lim, Yonghwi Jin, Mansour Alharthi, Xiaokuan Zhang, Jinho Jung, Rajat Gupta, Kuilin Li, Daehee Jang, Taesoo Kim [url] [dblp]
Published in 2021-12 at and I read it in 2022-02
Abstract: Web browsers are integral parts of everyone’s daily life. They are commonly used for security-critical and privacy sensitive tasks, like banking transactions and checking medical records. Unfortunately, modern web browsers are too complex to be bug free (e.g., 25 million lines of code in Chrome), and their role as an interface to the cyberspace makes them an attractive target for attacks. Accordingly, web browsers naturally become an arena for demonstrating advanced exploitation techniques by attackers and state-of-the-art defenses by browser vendors. Web browsers, arguably, are the most exciting place to learn the latest security issues and techniques, but remain as a black art to most security researchers because of their fast-changing characteristics and complex code bases.
quotes
- “More specifically, we first introduce a unified architecture that faithfully represents the security design of four major web browsers. Second, we share insights from a 10-year longitudinal study on browser bugs. Third, we present a timeline and context of mitigation schemes and their effectiveness. Fourth, we share our lessons from a full-chain exploit used in 2020 Pwn2Own competition. We believe that the key takeaways from this systematization can shed light on how to advance the status quo of modern web browsers, and, importantly, how to create secure yet complex software in the future.”
- “Note that although Universal Cross-Site Scripting (UXSS) [166] sounds similar to XSS, it commonly originates from problems in the browser’s implementation and design, so it is considered web browser security (§III-E).”
- “The complex nature of websites leads to numerous security policies and unique features of each web browser.”
- “JavaScript engines are the core of modern browsers, which convert JavaScript code into machine code. Major browsers use just-in-time (JIT) compilation to speed up the code execution.
Also, JIT compilers model the result and side-effects of all operations and run various analysis passes to optimize the code. If any of these goes wrong, native code with memory corruption issues can be emitted and executed, which can lead to severe security
implications. While each engine has different implementations, they share similar design principles and have common attack surfaces. Therefore, attackers can build generic attack primitives which work across different engines, such as fakeobj and addrof primitives and element kind transitions. JavaScript engines are being used outside browsers as well (e.g. Node.js uses V8), amplifying the impact of security bugs in JavaScript engines. We discuss issues caused by homogeneous browser engines in section 6.” - “Linux. Unlike on Windows, the Linux sandbox is mainly based on seccomp, chroot and namespace. First, seccomp is a standard system call filter based on the eBPF language. Since the default seccomp configuration is overly tight, browsers define their own filtering rules.
For example, Chrome applies its custom seccomp rules to all processes except the broker process, and the detailed rules vary for each process.
Second, to restrict file access, Linux-based browser sandboxes utilize chroot jailing.
Once a process is confined with chroot, no upper hierarchy of the file system is reachable. For example, Firefox applies chroot jailing to all renderers and only allows them to access specific files based on file descriptors obtained from the broker process. Also, browsers use namespaces to create separated spaces for various resources, such as user, networking, and IPC. For example, creating and joining a user namespace enables
a sandboxed process to be in a separate UID and GID, effectively disabling access to other unsandboxed processes.” - “Since a process-based sandbox uses a non-trivial amount of memory, mobile platforms
introduce subtle differences in sandbox policies or disable them depending on the available resources. For example, on Android, Site Isolation in Chrome is enabled only when the device has enough memory (>1.9GB), and the user need to enter passwords on the website. On iOS, Safari uses sandbox rules that are different from macOS because different system services and IOKit drivers are exposed on mobile. Due to such differences, some exploits may work only on mobile platforms.” - “Renderer bugs are dominant in both Firefox and Chromium since renderers are the core of browsers.”
- “Recently, there have been efforts in rewriting browsers using memory-safe languages
(e.g., Rust) to mitigate memory-safety bugs. For example, Mozilla is rewriting parts of Firefox in Rust with an ongoing project called Oxidation. Up until 2020, the Oxidation project had replaced 12% of Firefox's components with Rust equivalents. Five of the replaced subcomponents fall under the renderer's media parsing component.” - “Since DOM bugs mostly rely on UAF problems, they have been mostly mitigated by UAF mitigations.”
- “Overwrite protection. Overwrite protections are standard protection mechanisms to prevent attackers from introducing arbitrary executable code, which can be seen as the last line of defense in the context of browser exploits. They mainly include four mechanisms: W $\oplus$ X, hardened JIT mapping, fast permission switch, and out-of-process JIT.”
- “As a result, mitigations in JS engines focus on eliminating attack primitives. Recently, the Edge team added a new security feature called Super Duper Secure Mode (SDSM), which basically disables JIT compilation. Users can choose to disable JIT on websites that are less frequently visited. While sacrificing some performance, it is a good approach for reducing attack surfaces.”
- “Same origin policy (SOP) is enforced by web browsers to keep a security boundary between different origins. SOP-bypass bugs can be used to compromise SOP to varying degrees, from leaking one bit to stealing full-page data. UXSS bugs are the most powerful type of SOP-bypass bug that can be used to facilitate cross-origin JavaScript code execution. In UXSS attacks, the attacker can inject scripts to any affected context by exploiting bugs in web browsers or third-party extensions, achieving the same effect as exploiting the XSS vulnerability in the target website.”
- “Site isolation is an effective mitigation against UXSS bugs. However, only Chrome and Firefox have site isolation deployed, since it requires a considerable amount of engineering effort”
- “Although vendors are trying, they are consistently behind in this arms race. Mitigations from vendors are mostly reactive, which means they are developed long after each wave of attacks. By the time an attack surface is finally closed, attackers have already come up with a better exploit.”
- “Modern browsers implement a basic level of heap separation between Javascript-reachable objects and other objects”
- “Delayed free. Another mitigation, delayed free, effectively increases the difficulty
of exploiting UAF bugs, but this approach cannot restrict the reclamation of dangling
pointers. Browsers use various garbage collection (GC) algorithms to deallocate heap-allocated objects with no references. Some variants of GC additionally scan stack and heap areas to find possibly overlooked references, which is known as conservative scanning or delayed free” - “Browsers are also vulnerable to side-channel attacks. To date, studies have shown that sensitive information in browsers can be inferred via
- microarchitectual state;
- GPU;
- floating-point timing channels and
- browser-specific side channels”
- “Cross-Origin-Opener-Policy (COOP) and Cross-Origin-Embedder-Policy (COEP) were introduced to set up a cross-origin isolated environment. COOP allows a website to include a response header on a top-level document, ensuring that the cross-origin documents do not share the same browsing context group with itself, thus preventing direct DOM access.”
- “Improving memory safety. The Chrome team has explored improvements for their C++ codebase that can eliminate/reduce specific types of bugs by limiting the use of specific language features (e.g., C++ exceptions) and introducing wrapper
classes around integer operation.” - “In the case of Spectre/Meltdown attacks, browser vendors worked together to build a plan for mitigating the immediate threats, which is a great example of collaborative effort.”
- “For example, iOS Safari was exploited due to the 1.5-month patch gap”
- “Some industrial efforts on fuzzing browsers are highly effective on finding complex browser bugs. For example, ClusterFuzz runs on over 25,000 cores and found over 29,000 bugs in Chrome.”
summary
The paper looks at Chrome/Blink/V8, Safari/WebKit/JavaScriptCore, Firefox/Gecko/SpiderMonkey, and Internet Explorer/Trident/Chakra web browsers and their architecture with respect to security requirements. This concerns the rendering, IPC, Site isolation, Javascript engine security as well as web technologies like Same-Origin Policy. In Table 1, one can see many sandboxing/isolation mechanisms in place, especially in Chrome and Firefox. Browser exploitation scenarios and bug classifications are provided in Figure 2. The paper also looks at the history and timeline of bugs and bug classes as well as strategies to detect them. Table 4 provides a very detailed look at mitigations in browsers in relation to time.
In summary, the paper provides a very detailed look at web browser security. It can serve as list of keywords to get started on web security topics. In case you plan to build your own web browser engine, it will help you understand requirements and gives architectural design ideas. But it also gives a historical overview which migitations were deployed as time progressed.
Lessons:
- Using memory safe languages is an effective mitigation against memory-safety bugs.
- Higher payouts motivate more bug reports.
- UAF mitigations are effective towards reducing DOM bug exploits.
- Mitigating JS engine bugs is difficult.
- UXSS bugs are mostly mitigated by Site Isolation.
- Collaborative efforts on mitigations are good.
One controversial aspect:
“Although vendors are trying, they are consistently behind in this arms race. Mitigations from vendors are mostly reactive, which means they are developed long after each wave of attacks. By the time an attack surface is finally closed, attackers have already come up with a better exploit.”
I am not really sure about this statement. I think it is too strong. Did the authors consider how many mitigations were put in place to prevent security exploits in the first place? I don't think credit is provided for these decisions in the statement.
citation 121 = “Safer Usage Of C++” by Google Security
- “C/C++’s integer semantics are bonkers: the wrapping, overflow, underflow, undefined behavior, implicit casting, and silent truncation behaviors all add up to unsafety and poor ergonomics.”
- “P1705R1 Enumerating Core Undefined Behavior” by Shafik Yaghmour
Scribble: Closing the Book on Ad Hoc Documentation Too… §
Title: “Scribble: Closing the Book on Ad Hoc Documentation Tools” by Matthew Flatt, Eli Barzilay, Robert Bruce Findler [url] [dblp]
Published in 2009-09 at ICFP'09 and I read it in 2022-01
Abstract: Scribble is a system for writing library documentation, user guides, and tutorials. It builds on PLT Scheme’s technology for language extension, and at its heart is a new approach to connecting prose references with library bindings. Besides the base system, we have built Scribble libraries for JavaDoc-style API documentation, literate programming, and conference papers. We have used Scribble to produce thousands of pages of documentation for PLT Scheme; the new documentation is more complete, more accessible, and better organized, thanks in large part to Scribble’s flexibility and the ease with which we cross-reference information across levels. This paper reports on the use of Scribble and on its design as both an extension and an extensible part of PLT Scheme.
quotes
- “Besides the base system, we have built Scribble libraries for JavaDoc-style API documentation, literate programming, and conference papers. We have used Scribble to produce thousands of pages of documentation for PLT Scheme; the new documentation is more complete, more accessible, and better organized, thanks in large part to Scribble’s flexibility and the ease with which we cross-reference information across levels.”
- “Most existing documentation tools fall into one of three categories: LaTeχ-like tools that know nothing about source code; JavaDoc-like tools that extract documentation from annotations in source code; and WEB-like literate-programming tools where source code
is organized around a prose presentation.” - “Specifically, Scribble leverages lexical scoping as supplied by the underlying programming language, instead of ad hoc textual manipulation, to connect documentation and code.”
- “We developed Scribble primarily for stand-alone documentation, but we have also developed a library for JavaDoc-style extraction of API documentation, and we have created a WEB-style tool for literate programming. In all forms, Scribble’s connection between documentation and source plays a crucial role in crossreferencing, in writing examples within the documentation, and in searching the documentation from within the programming environment.”
- “The Scribble syntax for generating this document fragment is reminiscent of LaTeχ, using @ (like texinfo) instead of \:”
- “The initial #lang scribble/doc line declares that the module uses Scribble’s documentation syntax, as opposed to using #lang scheme for S-expression syntax.”
- “In this definition, real? and pict? are contracts for the function argument and result. Naturally, they are in turn hyperlinked to their definitions, because suitable libraries are imported for-label in the documentation source.”
- “The above documentation of circle is implemented using defproc:
@defproc[(circle [diameter real?]) pict?]{
Alternatively, instead of writing the documentation for circle in a stand-alone document—where there is a possibility that the documented contract does not match the contract in the implementation—the documentation could be written with the implementation of circle. In that case, the documentation would look slightly different, since it would be part of the module’s export declarations:
Creates an unfilled ellipse.
}
(provide/doc
[circle ([diameter real?] . -> . pict?)
@{Creates an unfilled ellipse.}]) ” - “Users of a text-markup language experience first and foremost the language’s concrete syntax. The same is true of any language, but in the case of text, authors with different backgrounds have arrived at a remarkably consistent view of the appropriate syntax: it should use blank lines to indicate paragraph breaks, double-quote characters should not be special, and so on. At the same time, a programmable mark-up language needs a natural escape to the programming layer and back.”
- “For Scribble, our solution is the @-notation, which is a text-friendly alternative to traditional S-expression syntax. More precisely, the @-notation is another way to write down arbitrary S-expressions, but it is tuned for writing blocks of free-form text.”
- “The grammar of an @-expression is roughly as follows (where @, [, ], {, and } are literal, and x? means that x is optional):
<at-expr> := @<op>? [<S-expr>*]? {<text>} ?
<op> := <S-expr> that does not start with [ or {
<S-expr> := any PLT Scheme S-expression
<text> := text with balanced {...} and with @-exprs ” - “Section content should be grouped implicitly via section, subsection, etc. declarations, instead of explicitly nesting section constructions.”
- “Paragraph breaks should be determined by empty lines in the source text, instead of explicitly constructing paragraph values.”
- “A handful of ASCII character sequences should be converted automatically to more sophisticated typesetting elements, such as converting ‘‘ and ’’ to curly quotes or --- to an em-dash.”
- “These transformations are specific to typesetting, and they are not appropriate for other contexts where the @ notation is useful. Therefore, the @ parser in Scribble faithfully preserves the original text in Scheme strings, and a separate decode layer in Scribble provides additional transformations.”
- “Functions like bold and emph apply decode-content to their arguments to perform ASCII transformations, and item calls decode-flow to transform ASCII sequences and form paragraphs between empty lines. In contrast, tt and verbatim do not call the decode layer, and they instead typeset text exactly as it is given.”
- “As an embedded domain-specific language, Scribble follows a long tradition of using Lisp- and Scheme-style macros to implement little languages. In particular, Scribble relies heavily on the Scheme notion of syntax objects (Sperber 2007), which are fragments of code that have lexical-binding information attached. Besides using syntax objects in the usual way to implement macros, Scribble uses syntax objects to carry lexical information all the way through document rendering.”
- “A deeper dependence of Scribble on PLT Scheme relates to #lang parsing. The #lang notation organizes reader extensions of Scheme (i.e., changes to the way that raw text is converted to S-expressions) to allow new forms of surface syntax. The identifier
after #lang in the original source act as the ‘language’ of a module.” - “To parse a #lang line, the identifier after #lang is used as the name of a library collection that contains a "lang/reader.ss" module. The collection’s "lang/reader.ss" module must export a read-syntax function, which takes an input stream and produces a syntax object. The "lang/reader.ss" module for scribble/doc parses the given input stream in @-notation text mode, and then wraps the result in a module form. For example,
#lang scribble/doc
in a file named "hello.scrbl" reads as
@(require scribble/manual)
It was a @bold{dark} and @italic{stormy} night.(module hello scribble/doclang
doc ()
"\n" (require scribble/manual) "\n"
"It was a " (bold "dark") " and "
(italic "stormy") "night." "\n")
where doc is inserted by the scribble/doc reader as the identifier to export from the module, and the () is a convenience explained below.” - “The doc binding that a Scribble module exports is a description of a document. Various tools, such as the scribble command-line program, can take this description of a document and render it to a specific format, such as LaTeχ or HTML.”
- “Scribble’s documentation abstraction reflects a least-common denominator among such document formats. For example, Scribble has a baked-in notion of itemization, since LaTeχ, HTML, and other document formats provide specific support to typeset itemizations. For many other layout tasks, such as formatting Scheme code, Scribble documents fall back to a generic ‘table’ abstraction. Similarly, Scribble itself resolves most forms of cross-references and document dependencies, since different formats provide different levels of automatic support; tables of contents and indexes are mostly built within Scribble, instead of the back-end.”
- “An element within a paragraph can be one of the following:
- a plain string;
- an instance of the element structure type, […]
- a target-element, which associates a cross-reference tag with a list of elements, […]
- a link-element, which associates a cross-reference tag to a list of elements, […]
- a delayed-element eventually expands to a list of elements. […]
- A collect-element is the complement of delayed-element: […]
- A few other element types support more specialized tasks, […]”
- “The examples form of the scribble/eval library typesets an example along with its result using the style of a read-eval-print loop. For example,
@examples[(/ 1 2) (/ 1 2.0) (/ 1 +inf.0)]
produces the output:
Examples:> (/ 1 2)
1/2
> (/ 1 2.0)
0.5
> (/ 1 +inf.0)
0.0 ” - “Unlike a normal Scribble program, running a scribble/lp program ignores the prose exposition and instead evaluates the program in the chunks. In literate programming terminology, this process is called tangling the program.”
- “To recover the prose, the @lp-include[filename] form extracts a literate view of the program from filename.”
- “The arrows in Figure 3’s screenshot demonstrate how DrScheme can draw arrows from chunk bindings to chunk references, and from the binding occurrence of an identifier to its bound occurrences, even across chunks. These latter arrows are particularly helpful with literate programs, where lexical scope is sometimes obscured by the way that textually disparate fragments of a program are eventually tangled into the same scope.”
- “Although many existing PLT Scheme tools help in building documents, the process of generating HTML is significantly different from compilation tasks. The main difference is that cyclic dependencies are common in documentation, whereas library dependencies are strictly layered. For example, the core language reference contains many pointers into the overview and a few pointers to the GUI library and other extensions; all documents, meanwhile, refer back to the core reference.”
- “PLT Scheme documentation was previously written in LaTeχ and converted to HTML via tex2page (Sitaram 2007). Although tex2page was a dramatic improvement over our original use of latex2html, the build process relied on layers of fragile LaTeχ macros, HTML hacks, and pre- and post-processing scripts, which made the documentation all but impossible to build except by its authors.”
- “The LaTeχ category includes general word-processing tools like Microsoft Word, but L A TEX offers the crucial advantage of programmability, where macros enable automatic formatting of API details. Systems like Skribe (Gallesio and Serrano 2005) improve LaTeχ by offering a sane programming language.”
- “The JavaDoc category includes perldoc for Perl, RDoc for Ruby, Haddock (Marlow 2002) for Haskell, OCamlDoc (Leroy 2007), Doxygen (van Heesch 2007) for various languages (including Java, C++, C#, and Fortran), and many others.”
- “Literate programming tools such as WEB (Knuth 1984) and noweb (Ramsey 1994) are designed for documenting the implementation of a library as much as the API that a library exports. In a sense, these tools are an extreme version of the JavaDoc category, where the information communicated to a reader is drawn from both the prose and the executable source. In doing so, unfortunately, the tools typically revert to a textual slice-and-dice of the program and prose sources, instead of a programmable layer that spans the two halves.”
- “Simonis and Weiss (2003) provide a more complete overview of existing systems and add ProgDoc, which is similar to noweb in the way that it uses a pipeline of tools.”
- “Skribe’s format-independent document structure and its use of passes to render a document influenced the design of Scribble.”
- “The SLaTeχ (Sitaram 2007) system provides automatic formatting of Scheme code within a LaTeχ document.”
- “In terms of surface syntax, many documentation systems build on either S-expression notation (or its cousin XML) as a way to encode both document structure and program structure. Such representations are especially appropriate for an intermediate representation of documentation, as in DocBook (Walsh and Muellner 2008). S-expression encodings of documentation are especially common in Lisp projects, where data and code are mingled easily.”
-
“A documentation language should be designed not by piling escape conventions on top of a comment syntax, but by removing the weaknesses and restrictions of the programming language that make a separate documentation language appear necessary. Scribble demonstrates that a small number of rules for forming documentation, with no restrictions on how they are composed, suffice to form a practical and efficient documentation language that is flexible enough to support the major documentation paradigms in use today.”
— Clinger’s introduction to the RnRS standards, adapted for Scribble
summary
Excellent tool which expands some existing concepts from Scribe (2005). Fundamentally the prefix “#lang scribble/doc” dispatches parsing based on the module loaded. In particular S-expressions are considered as building blocks for the language defining typesetting elements. The primary advantage is to support the “program is data” paradigm in the typesetting context. The disadvantage is the tight coupling between the PLT Scheme infrastructure and the document.
- They split documentation tools into three categories: LaTeχ-like, JavaDoc-like, WEB-like
- The specified S-expression grammar does not allow to use unbalanced curly braces in text
- @emph{Yes!} == (emph "Yes!")
@section{Country @emph{and} Western} == (section "Country " (emph "and") " Western")
@itemize[(item "a") (item "b")] == (itemize (item "a") (item "b"))
@title[#:style 'toc]{Contracts} == (title #:style 'toc "Contracts")
@emph{committed by @username} == (emph "committed by " username)
@{Country @emph{and} Western} == ("Country " (emph "and") " Western") - Section 7 describes the typesetting elements defined
- When a library is installed in this way, its documentation is installed as the library is compiled. PLaneT supports library versioning, and multiple versions of a package can be installed at a time.
Seven great blunders of the computing world §
Title: “Seven great blunders of the computing world” by N. Holmes [url] [dblp]
Published in 2002-07 at and I read it in 2021-08
Abstract:
quotes
- “But we must remember the blunders so we can strike a proper balance between pride and humility—assuming there have indeed been blunders. This column aims to confirm their existence by giving examples.”
- “Unicode’s blunder was in aiming to encode every language rather than every writing system.”
- “IBM’s Ken Iverson and colleagues adapted his reformed mathematical notation, developed at Harvard, to use on computers.”
- “Blunders arise from a failure of imagination, from an inability to see beyond the immediate problem to its full social or professional context.”
summary
The article discusses seven topics, the author considers to be solved wrongfully from the technical community.
Even though the mentioned ‘blunders’ are a neat collection of historically decided debates, I don't think the term ‘blunder’ is justified. Especially blunder 4 “Commercial programming” is highly controversial in retrospect and mostly claimed without proper arguments. Blunder 1 “Terminology”, on the other hand, made me reflect on the terms “information” and “data”.
Software-based Power Side-Channel Attacks on x86 §
Title: “Software-based Power Side-Channel Attacks on x86” by Moritz Lipp, Andreas Kogler, David Oswald, Michael Schwarz, Catherine Easdon, Claudio Canella, Daniel Gruss [url] [dblp]
Published in 2020-11 at and I read it in 2020-12
Abstract: Power side-channel attacks exploit variations in power consumption to extract secrets from a device, e.g., cryptographic keys. Prior attacks typically required physical access to the target device and specialized equipment such as probes and a high-resolution oscilloscope.
questions
- “Fig. 1: A histogram of the power consumption of various instructions on the i7-6700K (desktop) system.”
with the same or different operands? - How does SGX-Step work? Zero-stepping? I don't get Figure 7
- Interpreting Figure 9
- “However, using zero stepping (Section IV-B2) and the possibility to observe the Hamming weight of bytes (Section III-E), masking is insufficient against our attacks on SGX enclaves.”
- “Timing-Independent Covert Channel” which information do you want to transmit?
quotes
- “However, until recently, power analysis attacks had two limitations. First, they primarily targeted small embedded microcontrollers rather than more complex high-performance desktop and server CPUs. Second, software-based attacks relying on the available interfaces were so far not successfully applied on x86 to leak fine-grained information, e.g., cryptographic key bits.”
- “CPA [11] is an extension of DPA, which examines the correlation between variations in the set of traces and a leakage model depending on the value of intermediate values [49].”
- “The Intel Running Average Power Limit (RAPL) mechanism was introduced with the Sandy Bridge microarchitecture to ensure the CPU remains within desired thermal and power constraints [27].”
- “Since Haswell, it has provided three distinct capabilities for controlling average power over timescales of multiple seconds, ~10 ms, and <10 ms (PL1, PL2, and PL3, respectively).”
- “Intel defines four different domains for RAPL: package (PKG), power planes (PP0 and PP1), and DRAM.”
- “Intel generally considers physical side-channel attacks on SGX out of scope. Side channels [9], [73], race conditions, and memory-safety violations are not in the threat model,”
- “Note that while we primarily refer to runtime energy consumption rather than power consumption throughout this work, these are directly related, as power = energy ÷ time.”
- “We performed the experiment on our Intel Xeon E3-1240 v5 (server) system, collecting measurements for all possible byte values for 627 hours.”
- “Moreover, note that Intel RAPL does not provide the energy consumption per core but per processor package. Thus, code executed on other cores have a direct influence on the measurement of a specific piece of code running on one core and, thus, the number of overall measurements increases to average out the noise introduced by the other cores.”
- “the attacker needs to align the recorded traces. The trace needs to contain a distinctive feature, e.g., a distinct peak in power consumption, so that traces can be shifted into alignment with each other. While
a privileged attacker can precisely control the victim’s execution and interrupt it at will, an unprivileged attacker cannot. However, if the attacker can control when the execution of the attacked code begins, or use a trigger signal such as a cache-based side channel [72], then the collected traces can be aligned based on that timing information.” - “We measured over 96 000 execution runs, yielding an overall attack time of 8.11 h on the E3-1275 v5. The result is illustrated in Figure 7.”
- “Incidentally, we note that the key recovery specifically fails for key bytes 0, 4, 8, and 12, i.e., the first byte of each 4-byte word.”
- “However, using zero stepping (Section IV-B2) and the possibility to observe the Hamming weight of bytes (Section III-E), masking is insufficient against our attacks on SGX enclaves.”
summary
In this paper, we present PLATYPUS attacks which are novel software-based power side-channel attacks on Intel server, desktop, and laptop CPUs. We exploit unprivileged access to the Intel Running Average Power Limit (RAPL) interface that exposes values directly correlated with power consumption, forming a low-resolution side channel.
- Targets Linux and Intel CPUs after Sandy Bridge. Might work on AMD and ARM also.
- Voltage package works best, then core package.
- Zero-Stepping is used as delaying technique
- RAPL allows to maintain power in more detail. Including monitoring in order to utilize cooling appropriately.
- We are not aware of user applications requiring RAPL access. So restriction of access as countermeasure seems reasonable.
- Intel: medium severity
- “Observing Intra-Cacheline Activity” is motivated by Intel's internal symmetric cryptography implementation guidelines
Some instructive mathematical errors §
Title: “Some instructive mathematical errors” by Richard P. Brent [url] [dblp]
Published in 2021-06-20 at and I read it in 2020-08
Abstract: We describe various errors in the mathematical literature, and consider how some of them might have been avoided, or at least detected at an earlier stage, using tools such as Maple or Sage. Our examples are drawn from three broad categories of errors. First, we consider some significant errors made by highly-regarded mathematicians. In some cases these errors were not detected until many years after their publication. Second, we consider in some detail an error that was recently detected by the author. This error in a refereed journal led to further errors by at least one author who relied on the (incorrect) result. Finally, we mention some instructive errors that have been detected in the author’s own published papers.
Comment: 25 pages, 75 references. Corrected typos and added footnote 5 in v2
feedback
Links to Wikipedia should be permalinks.
quotes
- “We describe various errors in the mathematical literature, and consider how some of them might have been avoided,”
- “Since mathematics is a human endeavour, errors can and do occur.”
- “The errors that we consider can be grouped into three broad categories.
- Well-known errors made by prominent mathematicians (see §2).
- Errors discovered by the author in other mathematicians’ work (§3).
- Some errors in, or relevant to, the author’s own work (§4).”
- “A considerably longer list is available online [70].”
- “If the author were not what he is, I would not for a moment hesitate to say that he has made a great mistake here.” (Phragmén, December 1888)
- “Eventually, in December 1889, Poincaré admitted that he had made an error with a critical consequence – his claimed proof of the stability of the solar system was invalid!”
- “Poincaré prepared a corrected version, about twice as long as the original prize entry, and it was eventually published [54]. Poincaré had to pay the extra costs involved, which exceeded the prize money that he had won.”
- “[…] in realising his error and making his corrections, Poincaré discovered the phenomenon of chaos”
- “We remark that several other mathematicians have claimed to prove RH. Some serious attempts are mentioned in [12, Ch. 8].”
- “Wiles worked to repair his proof, first alone, and then with his former student Richard Taylor. By September 1994 they were almost ready to admit defeat. Then, while trying to understand why his approach could not be made to work, Wiles had a sudden insight.”
- “I was sitting at my desk examining the Kolyvagin–Flach method. It wasn’t that I believed I could make it work, but I thought that at least I could explain why it didn’t work. Suddenly I had this incredible revelation. I realised that, the Kolyvagin–Flach method wasn’t working, but it was all I needed to make my original Iwasawa theory work from three years earlier. So out of the ashes of Kolyvagin–Flach seemed to rise the true answer to the problem. It was so indescribably beautiful; it was so simple and so elegant.” (Andrew Wiles, quoted by Simon Singh)
- “At the present time, all that we can say with certainty is that the status of Mochizuki’s proof is unclear. For further information, see [38, 58, 75], and comments on MathOverflow.”
- “The paper [14] contained some significant errors which were not noticed until 1997, when Donald Knuth was revising volume 2 of his classic series The Art of Computer Programming in preparation for publication of the third edition [39].”
- “In a curious twist, it turned out that Knuth’s value was incorrect, because he relied on some of the incorrect results in my paper [14], whereas my value was correct, because I had used a more direct numerical method that depended only on recurrences for certain distribution functions that were given correctly in [14]. With assistance from Flajolet and Vallée, we reached agreement on the correct value of K just in time to meet the deadline for the third edition of [39].”
summary
Nice paper showing how some errors crept up in mathematical literature. Some examples are historical, others are related to the author's work. Apparently the author works in the field of number theory and computational mathematics. Less examples are provided from algebra and geometry.
Reproducibility and numeric verification seem to be approaches to combat the underlying problems. How I also wonder how much a difference it would make if formulas are easier to search for. I wonder how accessible sagemath, Maple and others are for researchers (can they easily verify theorems of fields they are not familiar with?).
Some-well known errors:
- Four-color theorem (Kempe & Tait, 1880) → error found 11 years later
- Mertens and Stieltjes (1897) → claim was lacking a proof
- Poincaré's prize essay (1888) → funny situation due to awarding the prize money
Ad claim 2 - disproval methods:
- algebraic argument
- numeric evaluation
- analytical argument
Templates vs. Stochastic Methods: A Performance Analys… §
Title: “Templates vs. Stochastic Methods: A Performance Analysis for Side Channel Cryptanalysis” by Benedikt Gierlichs, Kerstin Lemke-Rust, Christof Paar [url] [dblp]
Published in 2006 at CHES 2006 and I read it in 2020-06
Abstract: Template Attacks and the Stochastic Model provide advanced methods for side channel cryptanalysis that make use of ‘a-priori’ knowledge gained from a profiling step. For a systematic comparison of Template Attacks and the Stochastic Model, we use two sets of measurement data that originate from two different microcontrollers and setups. Our main contribution is to capture performance aspects against crucial parameters such as the number of measurements available during profiling and classification. Moreover, optimization techniques are evaluated for both methods under consideration. Especially for a low number of measurements and noisy samples, the use of a T-Test based algorithm for the choice of relevant instants can lead to significant performance gains. As a main result, T-Test based Templates are the method of choice if a high number of samples is available for profiling. However, in case of a low number of samples for profiling, stochastic methods are an alternative and can reach superior efficiency both in terms of profiling and classification.
ambiguity
- page 2: “a multivariate characterization of the noise” → noise is defined as “noise + performed operation”
- page 3: “for each time instant” → undefined term “time instant”
- page 3: “it is the average mi2 of all available samples” → do we square the average? is it already squared?
- page 3: “(P1, …, Pp)” → what is P?
- page 6: “(II) the number of curves for profiling” → which curves? difference curves? they were only defined for the template method not for the stochastic method
quotes
- “Especially for a low number of measurements and noisy samples, the use of a T-Test based algorithm for the choice of relevant instants can lead to significant performance gains.”
- “However, in case of a low number of samples for profiling, stochastic methods are an alternative and can reach superior efficiency both in terms of profiling and classification.”
- “The underlying working hypothesis for side channel cryptanalysis assumes that computations of a cryptographic device have an impact on instantaneous physical observables in the (immediate) vicinity of the device, e.g., power consumption or electromagnetic radiation”
- approaches depending on number of stages:
- one-stage approach:
- directly extract key
- two-stage approach:
- “profiling step”
- “attack step”
- one-stage approach:
- “Templates were introduced as the strongest side channel attack possible from an information theoretic point of view”
- “This is due to the fact that positive and negative differences between the averages may zeroize, which is desirable to filter noise but hides as well valuable peaks that derive from significant signal differences with alternating algebraic sign.”
- “Templates estimate the data-dependent part ht itself, whereas the Stochastic model approximates the linear part of ht in the chosen vector subspace (e.g., F9) and is not capable of including non-linear parts.”
- “The number of measurements, both during profiling and key extraction, is regarded as the relevant and measurable parameter.”
- “We focus on the number of available samples (side channel quality) since computational complexity is of minor importance for the attacks under consideration.”
- “We focus on the number of available samples (side channel quality) since computational complexity is of minor importance for the attacks under consideration.”
- “Hence, a general statement on which attack yields better success rates is not feasible as this depends on the number of curves that are available in the profiling step. If a large number of samples is available (e.g., more than twenty thousand), the Template Attack yields higher success rates. If only a small number of samples is available (e.g., less than twenty thousand), stochastic methods are the better choice.” (w.r.t. Metric 3)
- “The Stochastic Model’s strength is the ability to “learn” quickly from a small number of samples. One weakness lies in the reduced precision due to the linear approximation in a vector subspace.”
- “The Template Attack’s weakness is its poor ability to reduce the noise in the side channel samples if the adversary is bounded in the number of samples in the profiling step.”
- “The T-Test Template Attack is the best possible choice in almost all parameter ranges.”
- “For example, using N = 200 profiling measurements and N3 = 10 curves for classification it still achieves a success rate of 81.7%.”
summary
Important empirical results. Parameters and assumptions are okay. Results are fundamental and significant. However, the description of the methods (templates, stochastic) are quite bad. Looking up the references is required.
Notes:
- sosd: \sum_{i,j=1}^K (m_i - m_j)^2 for i ≥ j
- sost: \sum_{i,j=1}^K \left(\frac{m_i - m_j}{\sqrt{\frac{\sigma_i^2}{n_i} + \frac{\sigma_j^2}{n_j}}\right)^2 for i ≥ j
The Aesthetics of Reading §
Title: “The Aesthetics of Reading” by Kevin Larson, Rosalind Picard [url] [dblp]
Published in 2005 at and I read it in 2021-08
Abstract: In this paper we demonstrate a new methodology that can be used to measure aesthetic differences by examining the cognitive effects produced by elevated mood. Specifically in this paper we examine the benefits of good typography and find that good typography induces a good mood. When participants were asked to read text with either good or poor typography in two studies, the participants who received the good typography performed better on relative subjective duration and on certain cognitive tasks.
quotes
- “Our goal with this project is to develop a measure that is sensitive to improvements in aesthetics. By extending two earlier methodologies we hope to find one that is successful in detecting differences. The first methodology is based on the adage time flies when you’re having fun. Participants’ perception of time is manipulated by the enjoyment of their activity. The second methodology is based on the finding that participants perform better on certain cognitive tasks when they are in a good mood.”
- “Weybrew extended Zeigarnik’s work on task interruption by demonstrating that task interruptions cause participants to overestimate task duration (Weybrew, 1984).”
- “Recent work has turned this finding into a useful usability measure called relative subjective duration (Czerwinski, Horvitz, Cutrell, 2001). Relative subjective duration (RSD) measures participant’s perception of how long they have been performing a task.”
- “Our hope is that RSD not only detects task difficulty, but also aesthetic differences.”
summary
This paper tries to evaluate ClearType's typographic performance in a user study by evaluating performance in creative tasks. I think the assumptions are very strong (e.g. “Our hope is that RSD not only detects task difficulty, but also aesthetic differences”).
- The figures give neat examples for good&bad typography
- The goal is to measure improvements in aesthetics
- Study 1: 20 people read a text on a tablet (⇒ 10 with good, 10 with bad typography)
bad typography := bitmap Courier font, 2pts extra between words, worse hyphenation
20min reading time, interrupted after 15min ⇒ relative subjective duration, Likert scale questionnaire, performance in candle task
p-value < 0.05 showed a difference in performance - Study 2: 20 people, …, interrupted after 17min ⇒ … performance in finding compound words, …
- Assumptions:
- Isen 1987: in positive mood ⇒ perform better on cognitive tasks
- GTAE 2004: good typography ⇒ +17% improved word recognition
- the selected examples are representative
- Weybrew 1984: task interruptions ⇒ overestimated work duration
- CHC 2001: difficult tasks ⇒ duration overestimated, easy ⇒ underestimated (quantified by RSD)
- The p-value was chosen appropriately
- The participants were influenced by typography and not other factors (daytime, …)
- The number of participants is sufficiently high
The Case of Correlatives: A Comparison between Natural… §
Title: “The Case of Correlatives: A Comparison between Natural and Planned Languages” by Federico Gobbo [url] [dblp]
Published in 2011 at Journal of Universal Language 2011 and I read it in 2021-07
Abstract:
quotes
- “Since the publication of Volapük, the most important functional and deictic words present in grammar—interrogative, relative and demonstrative pronouns, and adjectives among others—have been described in planned grammars in a series or a table, namely ‘correlatives,’ showing a considerable level of regularity.”
- “The main result of this comparison is that, in the case of correlatives, some natural languages are surprisingly far more regular than their planned daughters, in spite of the fact that regularity was a major claim of the efforts in planning IALs during the late XIX and early XX centuries in Europe.”
- “Most language planners are men, while women are rare (Yaguello 2001).”
- “Blanke (1985) proposes a scale where to put planned languages following their sociolinguistic success, i.e., the presence and importance of a speech community; this scale starts from ‘project’ (no speech community) until ‘language’ (stable speech community, with presence of family language).”
- “More recent examples of languages planned for non-auxiliary purposes can be found, in particular for literary of fictional ones. For example, Klingon and Na’vi share a lot of characteristics: both were planned as an important part of the background for the science-
fiction universes, respectively for the Star Trek saga (see at least Okrand 1992) and James Cameron’s blockbuster movie Avatar (see at least Frommer 2009).” - “Basing on Bausani (1974), Gobbo (2008) finds another criterion beyond purpose in order to classify planned languages: publicity, i.e., the dichotomy exoteric vs. esoteric. Bausani’s example of esoteric language is again Balaibalan;”
- “Language planners quite often claim that their language is ‘easy’ to learn, compared to competing IALs—and, of course, natural languages. This claim is based on the regularity of structure of IALs: the aim of regularity is that learners acquire a reasonable level of passive and active proficiency quickly and efficiently. But easiness is hardly acceptable as a linguistic
dimension: how to measure it?” - “Regularity is an internal or intralinguistic dimension: no language is completely suppletive, i.e., there are always paradigms that form common transformations in a regular way, which are valid in most cases.”
- “Similarity is an external or interlinguistic dimension: the IAL should have a considerable degree of similarity to the lexicon and writing system of the “source languages,” i.e., natural languages taken as models for planning, so that learners can take advantage from their language repertoire in becoming familiar with the proposed IAL without extra effort.”
- “Bausani (1974) notes that the core features of IALs—in particular phonetics and phonology—are determined by the language repertoire of the language planner, who often chooses unconsciously the sounds belonging to his mother tongue as distinctive features for the phonemes of the IAL.”
- “Correlatives comprehend interrogative clauses and their answers, as well as their relative counterparts.”
- “In particular, with the important exception of the verbant character (I), correlatives can take any grammar character: adjunctive (A), stative (O) or circumstantial (E).”
- “The normative grammars of languages belonging to the Standard Average European (SAE) sprachbund were influenced by Latin grammar”
- “However, regularity in natural languages is only a tendency: for instance in French, the causal correlative is rendered through an analytical strategy, which put together the factual quoi (what) and the preposition pour (for); a similar strategy is also attested in the English what for (5a), while German borrows from the locative in order to fit the same function need (lit. wofür stands for ‘where-for,’ 5b).”
- “Even if their analysis is limited to Western languages, it is worth noticing that regularity in correlatives is not confined in natural languages belonging to the SAE sprachbund, but, on the contrary, it is a tendency found in many natural languages of the world.”
- “The sociolinguistic relative success of Volapük at the end of the XIX century in Europe (Golden 1997) largely influenced other language planners—Zamenhof included, especially in his proto-
Esperanto proposed in 1881 (Waringhien 1959, Tresoldi 2011). Schleyer, Volapük’s inventor and owner, largely considered the dimension of regularity more important than the similarity with any natural language, and correlatives are no exceptions.” - “According to many scholars, this fact was a key factor in the fall of the Volapukist movement and the rise of the Esperantist movement (see Forster 1982, Large 1985).”
- “Most volapukesques were published in Germany or France between the end of the XIX century and the very beginning of the XX century. Couturat & Leau (1903) report five direct reforms of Volapük: Hilbe’s Zahlensprache, Bauer’s Spelin, Fieweger’s Dil, Dormoy’s Balta, Guardiola’s Orba.”
- “They are von Arnim’s Veltparl, published in Opole (Poland) in 1896, and Marchand’s Dilpok, published in Besançon (France) in 1898. Even if some influence of Esperanto can be accounted, their model in language planning was still Volapük.”
- “As the two mathematicians were very precise in their review, it is probable that many language planners did not take correlatives into account, at least as part of the core features of the proposed IAL published at its launch.”
- “In fact, unlike Spelin, Hilbe’s Zahlensprache shows a clear influence by Latin.”
- “All interrogative clauses are always introduced by li—Zahlen-sprache extending the original use of Volapük, where li introduces only yes/no questions.”
- “As a provisional conclusion, volapukesques show very different strategies about correlatives: from absolute regularity in the case of Spelin, to considerable similarity to Latin in Zahlensprache.”
- “In particular, some traits of Esperanto are typically Slavic (Comrie 1996), while most if not all reforms of Esperanto—which are called since Bausani (1974) “esperantidos”—cut off most
influences from Slavic languages in particular, but also Germanic ones, with the important exception of English (Gobbo 2005b). In particular, one of the most criticized parts of the Esperanto grammars has always been the correlative system, because it is regular but not similar to any widely used natural language. However, Esperanto correlatives show a considerable degree of similarity with Lithuanian ones (see Table 11 below), Lithuanian being part of Zamenhof’s repertoire (Künzli 2010).” - “The first esperantido was a reform proposed by Zamenhof himself to the readers of the first journal written in Esperanto, La Esperantisto. The reform was rejected by the readers themselves after a referendum in 1894 (Haupenthal 1988).”
- “Esperanto (1894) retains all the wideness of the Esperanto correlative system, but it sacrifices regularity in favor of similarity with Latin, as shown in Table 11.”
- “Table 13 (below) shows that Ido retains part of the regularity of Esperanto, although its morphemes are fairly more similar to Latin and therefore more familiar to most learners in 1908, when Ido was published:”
- “However, it is not intuitive why the Latin prefix qu- is retained in some forms (e.g., Ido and Latin quo, English what) but not in others—for instance, the Latin quando becomes in Ido kande.”
- “Furthermore, the authoritative grammar of Ido by de Beaufront—recently (2005) republished in digital form—uses the word ‘correlatives’ only once: the author programmatically rejects
to consider these words as a regular system, in order to take a distance from Esperanto.” - “As described in section 1, there are three dimensions of analysis of planned languages: publicity, purpose (e.g., auxiliary, religious, fictional) and the diamesic axis (i.e., the main channel of use, whether written or spoken).”
- “Interestingly, no Tolkien’s language apparently shows any correlative. Kloczko (2002, 2004) has extracted a corpus-based grammar of every language planned by Tolkien, complete with the appropriate dictionaries. Unfortunately, most texts are poems, where correlatives seem to be never used. After all, the aim of Tolkien was not an actual use by other people, but rather they served as part of the background of Middle-Earth, its fictional world described in his novels and essays, which is firmly grounded in Old and Middle English language and literature (Solopova 2009).”
- “In sum, the strategy behind the language planning of Klingon is regularity in syntax, while morphology is highly idiosyncratic, perhaps to increase the appearance of “exoticism” in the Klingon language.”
- “Unlike Klingon, the syntax of Na’vi is rich and complex. For example, there are six cases: subjective, agentive, patientive, dative, genitive, and topical—it is worth noticing that Na’vi is pragmatically split-ergative too.”
- “In Na’vi, there is the attributive (Tesneèrian symbol: A) particle a which is used to transform the whole sentence in a adjunctive, so that relatives follow the same rules of adjectives.”
- “For instance, the English expression the man on the moon is rendered literally as ‘the on-moon-attribute man.’ Moreover, there is a ‘resumptive pronoun’ for animate heads (po) and one for inanimate (tsaw) when the head of the relative clause is neither the subject nor the direct object (Annis 2011: 46).”
- “Moreover, after Esperanto, correlatives were often presented as an apart category in grammars: when this happens, correlatives tend to show a considerable level of regularity.”
- “In conclusion, it can be said that planning a language—for whatever purpose—is not an easy task: the language planner should not only master the natural languages to be used as sources, but also he or she needs to understand the structural principles that underlie their grammars. This is particularly true in the case of correlatives.”
summary
A very neat paper from the field of linguistics. Language elements are first established based on grammatical categories (Bausani (1974) and Gobbo (2008) distinguish esoteric/exoteric languages) (Blanke (1985) differentiates project to language on a one-dimensional axis) (Grammar Characters by Tesnière (1959)).
In the following planned languages with their peak development on the transition from the 19th to 20th century and a focus on the Standard Average European sprachbund are discussed. Here the distinction of similarity (with established languages) versus regularity (formed along formal principles) is pointed out as fundamental. Similar to the notion of suppletive. The paper then contributes the correlatives in various languages: {English, German, French, Latin, Volapük, Spelin, Zahlensprache, Lithuanian, Esperanto, Esperanto (1984), Ido, Novial, Neo, Interlingua, Klingon, Na'vi}. The paper appropriately discusses the relation between those languages.
The main result is given as “some natural languages are surprisingly far more regular than their planned daughters in spite of the fact that regularity was a major claim of the efforts in planning IALs during the late 19th and early 20th century”. The result is less unexpected in the end. As the discussed developments show, similarity is sometimes favored over regularity thus neglecting a regular table of correlatives. Furthermore some discussed auxiliary languages never used simplicity/easiness/regularity as the main goal. In essence, Esperanto shows the most regular system which is an auxiliary language.
- Most of the tables contain tables of correlatives and are very informative.
- The history of planned/auxiliary languages in Europe is provided as a neat primer
- suppletive: the flection is reuses the root of the word and does not create a completely different word
typo
page 73, remove “took”
The European Union and the Semantic Web §
Title: “The European Union and the Semantic Web” by Neville Holmes [url] [dblp]
Published in 2008 at and I read it in 2021-09
Abstract:
quotes
- “In the essay, I deplored the failure of the European Union to streamline the translation of their documents into the Union’s various official languages and proposed E-speranto, a simplified dialect of Esperanto, as an intermediate language to form the basis for the streamlining.”
- “‘Esperanto, anyone?’—by Robert Glass in his March/April 2008 IEEE Software column on the unsuitability of English as a lingua franca”
- “Thus a signals here and now, o signals ahead or the future, i behind or the past, u conditional or propositional, and e perpetual or definitional.”
- “The proliferation of formal vocabularies for the Semantic Web seems to result from various interest groups each focused on supporting their own needs. This is rather like a library divided into interest sections, each with its own independent topic classification. This makes cross-disciplinary research difficult and adventitious discovery through browsing less likely.”
- “At the topmost level, items of the vocabulary will have five phonemes: initial, prefix, vowel, suffix, and final.”
- “Ultimately, establishing an intermediary universal vocabulary could make automatic indexing
and searching of text on the Web more effective and independent of the source language.”
summary
Without a particular reason the author creates the E-speranto proposal; a “simplification” to Esperanto. The elements are appropriately discussed for an introductory article but also the relationship to the EU is unmotivated. In the end, the author emphasizes the lexicon which is interesting to consider it as a separate concept from language.
The proposal:
Word endings: As an Indo-European language, E-speranto uses endings to express grammatical qualification of word stems. Thus, adverbs end in –e, infinitives in –i, and imperatives in –u. Synthesis starts showing in noun and adjective endings. Using a BNF-like notation, these are –[a|o](y](n] where the required a or o signal an adjective or noun respectively, the optional y signals plurality, and the optional n signals accusativity. Verb endings use the five vowels for a kind of temporal placement. Thus a signals here and now, o signals ahead or the future, i behind or the past, u conditional or propositional, and e perpetual or definitional. The full verb endings are –as for present tense, –os for future tense, –is for past tense, –us for conditional, and –es for perpetual. These verb endings can also be used as morphemes, so that osa is the adjectival future and la aso means the present. The vowels are used as placers in other synthetic syllables that can be used as suffixes or ordinary morphemes. The formula [vowel](n][t] yields 10 morphemes that qualify actions. With the n the action is active, without it passive, so amanta and amata are adjectives meaningloving and loved, and ante and ate are adverbs meaning actively and passively, all these set in the present. This construction can be simply extended to specify horizontal spatial placement by [vowel](m][p] and vertical by [vowel](n][k], where the m and n specify placement relative to the sentence’s subject. Also, u means left and e means right. Thus la domompo means the house ahead while la domopo means the front of the house. Such compounds can take some getting used to, but they are regular and powerful.
Structural words: Synthesis at the phonemic level is even more expressive in its use when building the pronouns and correlatives essential to expressiveness. In Esperanto these center on the vowel i, with affixed phonemes to give the particular class of meaning. The singular pronouns are mi, ci, li, xi, and ji for I, thou, he, she, and it, and si for reflexion. Here I use E-speranto spellings where c is pronounced like the ts in tsunami, and x as sh. There are also two plural pronouns: ni and vi for we and you. There are also two prefixes that can be used: i- to widen the scope and o- to abstract it. Thus ili means he and his associates or they, and omi means someone like me or one. The pronouns can also take grammatical suffixes such as –n for the accusative (min means me) and –a for the adjectival (mia means my). The correlatives are two-dimensional, with one range of meanings given by a suffix, and an independent range by a prefix. The generic correlatives have no prefix. Having a simple vowel suffix, ia, ie, io, and iu mean respectively some kind of, somewhere, something, and somebody. Having a vowel+consonant suffix, ial, iam, iel, ies, iol, and iom, mean respectively for some reason, at some time, in some way, someone’s, in some number, and in some amount. The specific correlatives apply
a variety of prefixes to the generic correlatives. The prefixes are k–, t–, nen–, and q– (pronounced ch–),
and they give selective, indicative, negative, and inclusive meanings. For example, kiam, tiam, neniam,
and qiam mean when, then, never, and always, respectively. This description shows how phonemic synthesis can yield dramatic richness simply. Further, the correlatives can also take the i– and o– scoping prefixes, and both the pronouns and correlatives can be grammatically suffixed.
The Implementation of Lua 5.0 §
Title: “The Implementation of Lua 5.0” by Roberto Ierusalimschy, Luiz Henrique de Figueiredo, Waldemar Celes [url] [dblp]
Published in 2005 at and I read it in 2022-12-10
Abstract: We discuss the main novelties of the implementation of Lua 5.0: its registerbased virtual machine, the new algorithm for optimizing tables used as arrays, the implementation of closures, and the addition of coroutines.
quotes
- “Currently the compiler accounts for approximately 30% of the size of the Lua core.” (Ierusalimschy et al., p. 3)
- “the hand-written parser is smaller, more efficient,more portable, and fully reentrant.” (Ierusalimschy et al., p. 3)
- “Lua is really lightweight: for instance, on Linux its stand-alone interpreter, complete with all standard libraries, takes less than 150 Kbytes; the core is less than 100 Kbytes.” (Ierusalimschy et al., p. 4)
- “We also consider Lua a simple language, being syntactically similar to Pascal and semantically similar to Scheme, but this is subjective.” (Ierusalimschy et al., p. 4)
- “Lua represents values as tagged unions ,that is, as pairs (t; v), where t is an integer tag identifying the type of the value v, which is a union of Ctypes implementing Lua types.” (Ierusalimschy et al., p. 5)
- “One consequence of using tagged unions to represent Lua values is that copying values is a little expensive: on a 32-bit machine with 64-bit doubles, the size of a TObject is 12bytes (or 16bytes, if doubles are aligned on 8byte boundaries) and so copying a value requires copying 3 (or 4) machine words.” (Ierusalimschy et al., p. 5)
- “The hash function does not look at all bytes of the string if the string is too long.” (Ierusalimschy et al., p. 6)
- “The hash part uses a mix of chained scatter table with Brent's variation [3].” (Ierusalimschy et al., p. 8)
- “Most procedural languages avoid this problem by restricting lexical scoping (e.g., Python), not providing first-class functions (e.g., Pascal), or both (e.g., C). Functional languages do not have those restrictions. Research in non-pure functional languages like Scheme and ML has created a vast body of knowledge about compilation techniques for closures (e.g., [19, 1,21]).” (Ierusalimschy et al., p. 9)
- “For instance, just the control flow analysis of Bigloo, an optimizer Scheme compiler [20], is more than ten times larger than the whole Lua implementation: The source for module Cfa of Bigloo 2.6f has 106,350 lines, versus 10,155 lines for the core of Lua 5.0. As explained in Section 2, Lua needs something simpler.” (Ierusalimschy et al., p. 9)
- “Lua uses a structure called an upvalue to implement closures. Any outer local variable is accessed indirectly through an upvalue. The upvalue originally points to the stack slot wherein the variable lives (Figure 4,left). When the variable goes out of scope, it migrates into a slot inside the upvalue itself (Figure 4, right).” (Ierusalimschy et al., p. 9)
- “Coroutines in Lua are stackful, in the sense that we can suspend a coroutine from inside any number of nested calls.” (Ierusalimschy et al., p. 10)
- “As such, they allow programmers to implement several advanced control mechanisms, such as cooperative multithreading, generators, symmetric coroutines, backtracking, etc. [7].” (Ierusalimschy et al., p. 10)
- “A key point in the implementation of coroutines in Lua is that the interpreter cannot use its internal C stack to implement calls in the interpreted code. (The Python community calls an interpreter that follows that restriction a stackless interpreter [23].)” (Ierusalimschy et al., p. 11)
- “Because a function running in a coroutine may have been created in another coroutine, it may refer to variables in a different stack. This leads to what some authors call a cactus structure [18]. The use of flat closures, as we discussed in Section 5, avoids this problem altogether.” (Ierusalimschy et al., p. 11)
- “Most instructions in a stack machine have implicit operands.” (Ierusalimschy et al., p. 12)
- “There are 35 instructions in Lua's virtual machine.” (Ierusalimschy et al., p. 12)
- “Branch instructions pose a difficulty because they need to specify two operands to be compared plus a jump offset.” (Ierusalimschy et al., p. 14)
- “The solution adopted in Lua is that, conceptually, a test instruction simply skips the next instruction when the test fails;” (Ierusalimschy et al., p. 14)
- “For function calls, Lua uses a kind of register window. It evaluates the call arguments in successive registers, starting with the first unused register. When it performs the call, those registers become part of the activation record of the called function, which therefore can access its parameters as regular local variables.” (Ierusalimschy et al., p. 15)
- “Lua uses two parallel stacks for function calls.” (Ierusalimschy et al., p. 15)
summary
Wonderful read with a beautiful overview on the Lua 5.0 interpreter runtime design. The sections illustrate the content: {The representation of Values, Tables, Functions and Closures, Threads and Coroutines, The Virtual Machine}.
unclarity
-
“The equivalent Lua program, a={[1000000000]=1}, creates a table with a single entry.” (Ierusalimschy et al., p. 6)
But the two-table design later on contradicts this statement. Since we skip storing the indices, we again require a huge array allocation.
The Profession as a Culture Killer §
Title: “The Profession as a Culture Killer” by Neville Holmes [url] [dblp]
Published in 2007 at and I read it in 2021-09
Abstract:
quotes
- “The computing profession and industry, however, saw this crudeness as simplicity and promulgated separate sets of similar crudity to cater to other cultural needs.”
- “To my dismay, I recently read in a local newspaper that ICANN, the Internet Corporation for Assigned Names and Numbers, would begin allowing non-Latin characters in domain names. Getting it done will be immensely complicated, and the result will likely be chaotic. Worse, it shows a complete disregard for mankind’s second greatest digital technology: writing.”
- “From this, it can be seen that the Internet could well provide a separate DNS for each writing system without compromising, and maybe even helping, its end-to-end model and unique binary naming.”
- “Scientists and engineers using programming languages like Fortran were thus not only restricted to capital letters but also had only the hyphen as a basic mathematical symbol. This led to the replacement of the traditional arithmetic symbols by commercial ones: multiplication’s saltire (×) by the asterisk (*), division’s obelus (÷) by the virgule (/), and even addition’s plus (+) by the ampersand (&), although users could pay extra money to get printer features that replaced ampersands with plusses.”
- “Indeed, the neglect by computing professionals of the Latin writing system’s culture brings about a multitude of problems.”
- “What we have now disgraces our profession.”
- “A good place to start would be to kill the Universal Web”
summary
Mentions a myriad of topics. His discussion of characters of ASCII is informative, but his remarks about technology and writing systems are incomplete and biased.
- Worse, it shows a complete disregard for mankind’s second greatest digital technology: writing
⇒ unjustified - In any case, users could mix URLs for different Webs if needed because each DNS would translate the domain names for the different Webs into the underlying Internet addresses.
⇒ requires clear definition of distinct writing systems - Indeed, the neglect by computing professionals of the Latin writing system’s culture brings about a multitude of problems
⇒ Who is responsible? You assume the non-Latin people - It’s also faster to read and more economical of space than linear alphabetic writing systems.
⇒ But the problems are not discussed
The Security Risk of Lacking Compiler Protection in We… §
Title: “The Security Risk of Lacking Compiler Protection in WebAssembly” by Quentin Stievenart, Coen De Roover, Mohammad Ghafari [url] [dblp]
Published in 2021 at QRS 2021 and I read it in 2022-12-12
Abstract: WebAssembly is increasingly used as the compilation target for cross-platform applications. In this paper, we investigate whether one can rely on the security measures enforced by existing C compilers when compiling C programs to WebAssembly. We compiled 4,469 C programs with known buffer overflow vulnerabilities to x86 code and to WebAssembly, and observed the outcome of the execution of the generated code to differ for 1,088 programs. Through manual inspection, we identified that the root cause for these is the lack of security measures such as stack canaries in the generated WebAssembly: while x86 code crashes upon a stack-based buffer overflow, the corresponding WebAssembly continues to be executed. We conclude that compiling an existing C program to WebAssembly without additional precautions may hamper its security, and we encourage more research in this direction.
potential contradiction
-
“We performed our evaluation using -O1 as the optimisation level.” (Stievenart et al., p. 7)
“For this reason, we decided to use -O2 as the level of optimization.” (Stievenart et al., p. 6)
quotes
- “We compiled 4,469 C programs with known buffer overflow vulnerabilities to x86 code and to WebAssembly, and observed the outcome of the execution of the generated code to differ for 1,088 programs. Through manual inspection, we identified that the root cause for these is the lack of security measures such as stack canaries in the generated WebAssembly: while x86 code crashes upon a stack-based buffer overflow, the corresponding WebAssembly continues to be executed.” (Stievenart et al., p. 1)
- “The standard has been designed with security in mind, as evidenced among others by the strict separation of application memory from the execution environment’s memory. Thanks to this separation, a compromised WebAssembly binary cannot compromise the browser that executes the binary [2], [7].” (Stievenart et al., p. 1)
- “This contradicts the design document of the WebAssembly standard [2] which states “common mitigations such as data execution prevention (DEP) and stack smashing protection (SSP) are not needed by WebAssembly programs”” (Stievenart et al., p. 1)
- “The execution model of WebAssembly is stack-based” (Stievenart et al., p. 2)
- “A WebAssembly program also contains a single linear memory, i.e., a consecutive sequence of bytes that can be read from and written to by specific instructions.” (Stievenart et al., p. 2)
-
“An example function is the following:
(func $main (type 4)
(param i32 i32)
(result i32)
(local i32)
local.get 0)” (Stievenart et al., p. 2) [with two params, one return value, one local variable] - “After the last instruction, the function execution ends and the value remaining on the stack is the return value.” (Stievenart et al., p. 2)
- “g0 acts as the stack pointer.” (Stievenart et al., p. 2)
- “Moreover, multiple features of WebAssembly render programs less vulnerable than their native equivalent. Unlike in x86, the return address of a function in WebAssembly is implicit and can only be accessed by the execution environment, preventing returnoriented programming attacks among others, diminishing the potential of stack smashing attacks. Also, function pointers are supported in WebAssembly through indirect calls, where the target function of the call is contained in a statically-defined table: this reduces the number of possible control flow exploits.” (Stievenart et al., p. 3)
- “We rely on the Juliet Test Suite 1.3 for C2 of the Software Assurance Reference Dataset [3], released in October 2017 and which has been used to compare static analysis tools that detect security issues in C and Java applications [5].” (Stievenart et al., p. 3)
- “In total, for CWE 121, we have 2785/5906 programs (47%) that can be compiled to WebAssembly, and for CWE 122, we have 1666/3656 programs (46%) that can be compiled.” (Stievenart et al., p. 3)
- “Moreover, stack smashing protection need to be ensured by the compiler rather than the WebAssembly runtime, that has no knowledge of how the program stack is managed.” (Stievenart et al., p. 5)
- “We now turn our attention to the impact of optimisations on the ability to prevent vulnerable code to make it into the binary7. To illustrate this, we inspect one specific example that exhibits different behaviour depending on the optimisation levels, which is in essence similar to our first example.” (Stievenart et al., p. 5)
- “For instance, one aspect which we have not encountered in the programs we inspected is that there is no use of function pointers.” (Stievenart et al., p. 7)
- “Arteaga et al. [4] propose an approach to achieve code diversification for WebAssembly: given an input program, multiple variants of this program can be generated. Narayan et al. [14] propose Swivel, a new compiler framework for hardening WebAssembly binaries against Spectre attacks, which can compromise the isolation guarantee of WebAssembly. Sti ́ evenart and De Roover propose a static analysis framework [16] for WebAssembly, used to build an information flow analysis [17] to detect higher-level security concerns such as leaks of sensitive information.” (Stievenart et al., p. 7)
summary
4469 C programs (of a corpus for static code analysis test cases) are compiled to WASM and differences in the behavior on those two platforms are observed. 1088 programs showed different behavior. Specifically, it is simply concluded that WASM does not provide stack smashing detection. The study is useful, but limited in scope and with an expected result.
The UNIX Time- Sharing System §
Title: “The UNIX Time- Sharing System” by Dennis M Ritchie, Ken Thompson, Bell Laboratories [url] [dblp]
Published in 1974-07 at Communications of the ACM and I read it in 2021-03
Abstract:
contradiction
- quote 1: “The size of an ordinary file is determined by the highest byte written on it; no predetermination of the size of a file is necessary or possible”
- quote 2: “The entry thereby found (the file's i-node) contains the description of the file as follows.” … “4. Its size”
Funny process
9.2 Per day (24-hour day, 7-day week basis)
There is a "background" process that runs at the lowest possible priority; it is used to soak up any idle CPU time. It has been used to produce a million-digit approximation to the constant e - 2, and is now generating composite pseudoprimes (base 2).
quotes
- “UNIX is a general-purpose, multi-user, interactive operating system for the Digital Equipment Corporation PDP-11/40 and 11/45 computers. It offers a number of features seldom found even in larger operating systems, including: (1) a hierarchical file system incorporating demountable volumes; (2) compatible file, device, and inter-process I/O; (3) the ability to initiate asynchronous processes; (4) system command language selectable on a per-user basis; and (5) over 100 subsystems including a dozen languages.”
- “Our own installation is used mainly for research in operating systems, languages, computer networks, and other topics in computer science, and also for document preparation.
Perhaps the most important achievement of UNIX is to demonstrate that a powerful operating system for interactive use need not be expensive either in equipment or in human effort: UNIX can run on hardware costing as little as $40,000, and less than two man-years were spent on the main system software” - “There is also a host of maintenance, utility, recreation, and novelty programs. All of these programs were written locally. It is worth noting that the system is totally self-supporting. All UNIX software is maintained under UNIX; likewise, UNIX documents are generated and formatted by the UNIX editor and text formatting program.”
- “The PDP-11/45 on which our UNIX installation is implemented is a 16-bit word (8-bit byte) computer with 144K bytes of core memory; UNIX occupies 42K bytes. This system, however, includes a very large number of device drivers and enjoys a generous allotment of space for I/O buffers and system tables; a minimal system capable of running the software mentioned above can require as little as 50K bytes of core altogether.”
- “The greater part of UNIX software is written in the above-mentioned C language [6]. Early versions of the operating system were written in assembly language, but during the summer of 1973, it was rewritten in C. The size of the new system is about one third greater than the old.”
- “UNIX differs from other systems in which linking is permitted in that all links to a file have equal status. That is, a file does not exist within a particular directory; the directory entry for a file consists merely of its name and a pointer to the information actually describing the file. Thus a file exists independently of any directory entry, although in practice a file is made to disappear along with the last link to it.”
- “There is a threefold advantage in treating I/O devices this way: file and device I/O are as similar as possible; file and device names have the same syntax and meaning, so that a program expecting a file name as a parameter can be passed a device name; finally, special files are subject to the same protection mechanism as regular files”
- “After the mount, there is virtually no distinction between files on the removable volume and those in the permanent file system.”
- “There is only one exception to the rule of identical treatment of files on different devices: no link may exist between one file system hierarchy and another. This restriction is enforced so as to avoid the elaborate bookkeeping which would otherwise be required to assure removal of the links when the removable volume is finally dismounted.”
- “Although the access control scheme in UNIX is quite simple, it has some unusual features. Each user of the system is assigned a unique user identification number. When a file is created, it is marked with the user ID of its owner.”
- “If the seventh bit is on, the system will temporarily change the user identification of the current user to that of the creator of the file whenever the file is executed as a program. This change in user ID is effective only during the execution of the program which calls for it. The set-user-ID feature provides for privileged pro- grams which may use files inaccessible to other users. For example, a program may keep an accounting file which should neither be read nor changed except by the program itself. If the set-user-identification bit is on for the program, it may access the file although this access might be forbidden to other programs invoked by the given program's user.”
- “There is no distinction between ‘random’ and ‘sequential’ I/O, nor is any logical record size imposed by the system.”
- “It should be said that the system has sufficient internal interlocks to maintain the logical consistency of the file system when two users engage simultaneously in such inconvenient activities as writing on the same file, creating files in the same directory, or deleting each other's open files.”
- “The space on all fixed or removable disks which contain a file system is divided into a number of 512-byte blocks logically addressed from 0 up to a limit which depends on the device. There is space in the i-node of each file for eight device addresses. A small (nonspecial) file fits into eight or fewer blocks; in this case the addresses of the blocks themselves are stored. For large (nonspecial) files, each of the eight device addresses may point to an indirect block of 256 addresses of blocks constituting the file itself. Thus files may be as large as 8 • 256 • 512, or 1,048,576 (220) bytes.”
- “Except while UNIX is bootstrapping itself into opera- tion, a new process can come into existence only by use of the fork system call: processid = fork(label)”
- “Although interprocess communication via pipes is a quite valuable tool (see §6.2), it is not a completely general mechanism since the pipe must be set up by a common ancestor of the processes involved.”
- “If file command cannot be found, the Shell prefixes the string /bin/ to command and attempts again to find the file. Directory /bin contains all the commands intended to be generally used”
- “An extension of the standard l/O notion is used to direct output from one command to the input of another. A sequence of commands separated by vertical bars causes the Shell to execute all the commands simultaneously and to arrange that the standard output of each command be delivered to the standard input of the next command in the sequence”
- “A program such as pr which copies its standard input to its standard output (with processing) is called a .filter.”
- “The PDP-1 l hardware detects a number of program faults, such as references to nonexistent memory, unimplemented instructions, and odd addresses used where an even address is required. Such faults cause the processor to trap to a system routine. When an illegal action is caught, unless other arrangements have been made, the system terminates the process and writes the user's image on file core in the current directory.”
- “Perhaps paradoxically, the success of UNIX is largely due to the fact that it was not designed to meet any predefined objectives”
- “The fork operation, essentially as we implemented it, was present in the Berkeley time-sharing system [8]. On a number of points we were influenced by Multics, which suggested the particular form of the I/O system calls [9] and both the name of the Shell and its general functions.”
- “A ‘crash’ is an unscheduled system reboot or halt. There is about one crash every other day;”
summary
Interesting retrospective read.
It was interesting to observe what has changed since then. Though I assume in 1974 they were quite experienced with their system design, such a huge design necessarily has a lot of controversial points.
The paper explains the filesystem and the shell as core concepts.I think the idle background process and the casual statement of “there is about one crash every other day” is funny from today's perspective.
Underspecified statement
“To provide an indication of the overall efficiency of UNIX and of the file system in particular, timings were made of the assembly of a 7621-1ine program. The assembly was run alone on the machine; the total clock time was 35.9 sec, for a rate of 212 lines per sec.”
… which program? What does it do?
The design of a Unicode font §
Title: “The design of a Unicode font” by C Bigelow, K Holmes [url] [dblp]
Published in 1993 at and I read it in 2020-10
Abstract: The international scope of computing, digital information interchange, and electronic publishing has created a need for world-wide character encoding standards. Unicode is a comprehensive standard designed to meet such a need. To be readable by humans, character codes require fonts that provide visual images — glyphs — corresponding to the codes. The design of a font developed to provide a portion of the Unicode standard is described and discussed.
ambiguity
“The inclusion of non-alphabetic symbols and non-Latin letters in these 8-bit character
sets required font developers to decide whether the assorted symbols and non-Latin letters
should be style-specific or generic.”
A definition of style-specific and generic would be nice.
“Hence, although accents need to be clearly differentiated, they do not need to be emphatic, and, indeed, overly tall or heavy accents can be more distracting than helpful to readers.”
What does empathic mean in this context?
nicely written
“To design such a font is a way to study and appreciate, on a microcosmic scale, the manifold variety of literate culture and history.”
“A ‘roman’ typeface design (perhaps upright would be a less culture-bound term, since a Unicode font is likely to include Greek, Cyrillic, Hebrew, and other scripts)” …
question
One of the advantages of Unicode is that it includes a Generic Diacritical Marks set of ‘floating’ diacritics that can be combined with arbitrary letters. These are not case-sensitive, i.e. there is only one set of floating diacritics for both capitals and lowercase
Isn't this a property of the font file format?
quotes
- “The Unicode standard distinguishes between characters and glyphs in the following way: ‘Characters reside only in the machine, as strings in memory or on disk, in the backing store.”
- “In contrast to characters, glyphs appear on the screen or paper as particular representations of one or more backing store characters. A repertoire of glyphs comprises a font.’”
- “By script or writing system we mean a graphical representation of language.”
- “Accumulated contextual glyphic variations are what transformed capitals into lowercase, for example, and turned roman into italic.”
- “Version 1.0 of the standard encodes approximately 28,000 characters, of which some 3,000 are alphabetic letters and diacritical marks for European, Indic, and Asian languages, 1,000 are various symbols and graphic elements, and 24,000 are ideographic (logographic), syllabic, and phonetic characters used in Chinese, Japanese, and Korean scripts [1,2].”
- “Although the textura typeface used by Gutenberg in the 42-line Bible of 1455-56, the first printed book in Europe, included more than 250 characters, and the humanistica corsiva typeface cut by Francesco Griffo for the Aldine Virgil of 1501, the first book printed in italic, included more than 200 characters, character sets became smaller in later fonts, to reduce the costs of cutting, founding, composing, and distributing type.”
- “Within one font, we wanted the different alphabets and symbols to maintain a single design theme.”
- “A problem with this kind of haphazard development is that the typographic features of documents, programming windows, screen displays, and other text-based images will not be preserved when transported between systems using different fonts.”
- “By ‘harmonization’, we mean that the basic weights and alignments of disparate alphabets are regularized and tuned to work together, so that their inessential differences are minimized, but their essential, meaningful differences preserved.”
- “Latinate typographers”
- “As designers, we would like to know how the ‘rules’ that govern legibility in non-Latin scripts compare to the rules for Latin typefaces. Shimron and Navon, for example, report a significant difference in the spatial distribution of distinctive features in the Roman and Hebrew alphabets [20].”
- “Although many typographic purists believe that simple obliques are inferior to true cursives, Stanley Morison argued in 1926 that the ideal italic companion to roman should be an inclined version of the roman [21].”
- “The European mode of distinction between formal and cursive type forms is not as strong a tradition in some non-Latin scripts, e.g. Hebrew (though there may be other kinds of highly regulated graphical distinctions), so a simple oblique is a more universal, albeit minimal, graphic distinction that can apply equally to all non-Latin scripts.”
- “we concluded that, for improved legibility in international text composition, accents and diacritics should be designed somewhat differently than in the standard version of Lucida Sans.”
- “Accordingly, we designed the lowercase diacritics of Lucida Sans Unicode to be slightly taller and a little different in modulation than those of the original Lucida Sans. Following current practice, we used the lowercase accents to compose accented capitals.”
- “One of the advantages of Unicode is that it includes a Generic Diacritical Marks set of ‘floating’ diacritics that can be combined with arbitrary letters. These are not case-sensitive, i.e. there is only one set of floating diacritics for both capitals and lowercase. In our first version of Lucida Sans Unicode, we implemented these as lowercase diacritics and adjusted their default position to float over the centre of a lowercase o. Ideally, there should be at least two sets of glyphs, one for lowercase and one for upper case (determined automatically by the text line layout manager of the OS or application), along with a set of kerning tables that optimizes the visual appearance of each combination of letter + diacritic.”
- “In a proportional font (like the Times Roman before the eyes of the reader), the advance width of a character is proportional to the geometric logic of its form. A proportionally-spaced m, which has a spatial frequency of three cycles per letter, is wider than an n, which has a frequency of two cycles, which in turn is wider than an i, which has one.”
- “In a fixed pitch font like Courier, all characters are of the same width, so that m is cramped and i extended, and ‘minimum’ has an irregular rhythm, since the spatial frequency of the letters is continually changing within the fixed width of the cells.”
- “Among the alphabetic Unicode character sets, Cyrillic poses an interesting problem in fixed-pitch mode because it has, compared to the Latin, a greater percentage of characters with higher spatial frequency (three or more cycles per letter) on the horizontal axis. The Hebrew alphabet, on the other hand, is more easily transformed to fixed-pitch mode because it has many fewer letters of high spatial frequency.”
- “While the assumption that characters are fully contained within cells was invariably true for primitive terminal and TTY fonts, it is not necessarily true of typographic fonts. In many PostScript and TrueType digital fonts, diacritics on capitals extend above the nominal upper boundary of the body of the font because, in an effort to reduce font file size, capital and lowercase accented characters are built as composites in which letters and diacritics are treated as subroutines,”
- “The lowercase form of most diacritics is taller than the capital form,”
- “A font that would contain all of Unicode 1.0 would be of daunting size, and the standard continues to grow as the Unicode committee adds more characters to it. Even without the Chinese/Japanese/Korean set, the alphabets and symbols comprise almost 4,000 separate characters, sixteen times larger than the usual 8-bit character sets.”
- “To call an incomplete font containing Unicode subsets a ‘Unicode’ font could be misleading, since some users could mistakenly assume that any font called ‘Unicode’ will contain a full set of 28,000 characters.”
- “The Japanese term ‘gothic’ is equivalent to ‘sans serif’, and is also used in English for sans serif faces, particularly those of 19th-century American design, e.g. Franklin Gothic and News Gothic.”
- “To satisfy the French critics and give Times greater appeal in the French market, the Monotype Corporation cut a special version of the face in accord with the dictates of Maximilien Vox, a noted French typographic authority [15,26]. Vox’s re-design of some fourteen characters brought Times slightly closer to the sophisticated style of the French Romain du Roi, cut by Philippe Grandjean circa 1693.”
- “For the German typographic market, Monotype cut a version of Times with lighter capitals that are less distracting in German orthography, where every noun is capitalized”
- “Robert Granjon” … “His ecclesiastical patrons in Rome called him ‘the excellent . . .’, ‘the most extraordinary . . .’, ‘the best ever . . .’ cutter of letters [31].”
- “we followed a traditional Hebrew thick/thin modulation, in which horizontal elements are thicker than vertical – the opposite of the Latin convention – but weighted the Hebrew characters to have visual ‘presence’ equivalent to that of the Latin.”
- “Unicode is a character encoding standard, not a glyph standard.”
- “For example, Unicode treats Latin capital B as one character and lowercase b as another because these are significant differences in Latin orthography.”
- “Unicode does not treat italic b or bold b as separate from b, because those letters are merely allographs, as the linguists would say, the graphic differences not being orthographically significant.”
- “S-cedilla and T-cedilla are used in both Turkish and Romanian, but the cedilla may be rendered as either the French form of cedilla or as a comma-like accent below the letter.”
- “Pike and Thompson [3] discuss the advantage of economical memory management and greater font loading speed when a complete Unicode character set is implemented as many small subfonts from which characters can be accessed independently, and this is the method used in the Plan 9 operating system, both for bitmap fonts and the Lucida Sans Unicode outline fonts. The other method is used in Microsoft Windows NT 3.1, in which the first version of the Lucida Sans Unicode font is implemented as a single TrueType font of 1,740 glyphs. In the current version of Windows NT, this allows a simpler font-handling mechanism, makes the automatic ‘hinting’ of the font easier, since all characters can be analyzed by the hinting software in one pass, and preserves the default design coordination of the subsets, if the font is based on a harmonized set of designs.”
summary
A neat paper. Due to my lack of understanding of the status quo in 1993, I cannot judge on the approach and quality. There are some considerations, I am not familiar with (e.g. why do we need to distinguish only two kinds of diacritics - lowercase and uppercase), but the paper gives a broad overview over design decisions that need to be made, when designing a ‘Unicode’ font. They designed 1700 characters, but Unicode 1.1 specifies 34,168 characters. It is part of the paper to discuss “One big font vs. many little fonts”.
- “Our initial motivation in designing a Unicode font was to provide a standardized set of glyphs that could be used as a default core font for different operating systems and languages. Within one font, we wanted the different alphabets and symbols to maintain a single design theme.”
- “Having described reasons in favor of creating a Unicode font, we should also discuss arguments against such an undertaking, and various problems we encountered.”
- “Ars longa, vita brevis” (i.e. huge development effort)
- “Culture-bound design” (i.e. typographers only ‘truely’ understand the writing system they use in their culture)
- “Homogenization” (“If it erases distinctive differences between scripts, it increases the possibility of confusion”)
- “Character standard vs. glyph standard”
- “One big font vs. many little fonts”
The problem with unicode §
Title: “The problem with unicode” by N. Holmes [url] [dblp]
Published in 2003-06 at and I read it in 2021-08
Abstract:
quotes
- “I suspected that the Unicode people had withdrawn from the debate when they realized that by blunder I did not mean failure.”
- “The official Unicode site states that it is an encoding system that ‘provides a unique
number for every character, no matter what the platform, no matter what the program, no matter what the language’ (www.unicode.org/unicode/standard/WhatIsUnicode.html).” - “Font classes such as typewriter, serif, and sans serif have as little meaning in the Arab writing system as diwani, kufic, and thuluth have in the Latin writing system.”
- “At the most populous end of the spectrum lie plain text messages such as I have to deal with every day: letters, e-mail, handwritten notes. Plain text of this kind, being mostly brief and personal, never mixes writing systems.”
- “Documents are best marked up in a single writing system, with any mixing of writing systems specified through markup directly or, better, by using macrodefinitions or specifying an inclusion.”
- “By putting all writing systems and languages together, Unicode becomes much too complex and unstable.”
- “In addition, no attempt should be made to implement any particular collating sequences. Not only are these complex, they also differ from culture. to culture. For example, German treats
ä as though it were a, while Finnish treats the two as distinct. English treats rh as two letters, while Welsh treats them as one. Thus, the placement of symbols within alphabets should be chosen to support transliteration.” - “The generative capability of this approach provides for complex use of accents as in Vietnamese and for the stable generation of new transliterations and symbols, thanks to typography’s ability to provide esthetically pleasing forms of newly popular compound symbols such as the euro.”
summary
In this article, the author argues that Unicode is blunder, too complex and unstable. First, he puts markup and plaintext into contrast. Then he continues to discuss the Latin writing system suggesting a eight-bit categorization system (without actually assigning glyphs). He continues to discuss keyboards leading towards CJK setups. In the end, he claims, Unicode does not take full advantage of the systematic graphical features of the various writing systems.
At least from today's perspective, the author claims to solve typesetting problems without actually offering a solution in detail. The suggested modifiers in Table 1 require further discussion by the typographers. Does the font specify the positioning of diacritics or is it part of his suggested scheme? In the end, these discussions are today solved in OpenType and the original bit-level encoding does not matter. His suggestion for various 8-bit systems requires a proper annotation of encodings per text file.
In the end, I think the author has some valid points, but his statements lack depth and time has solved these issues differently than he suggested.
typo
esthetically → aesthetically
Too Much Crypto §
Title: “Too Much Crypto” by Jean-Philippe Aumasson [url] [dblp]
Published in 2020-01-03 at Real-World Crypto 2020 and I read it in 2020/01
Abstract: We show that many symmetric cryptography primitives would not be less safe with significantly fewer rounds. To support this claim, we review the cryptanalysis progress in the last 20 years, examine the reasons behind the current number of rounds, and analyze the risk of doing fewer rounds. Advocating a rational and scientific approach to round numbers selection, we propose revised number of rounds for AES, BLAKE2, ChaCha, and SHA-3, which offer more consistent security margins across primitives and make them much faster, without increasing the security risk.
ambiguity
“Where we examine the reasons behind the number of rounds, comment on the risk posed by quantum computers, and finally propose new primitives for a future where less energy is wasted on computing superfluous rounds.”
Is this an English sentence?
notes
- “Designed in the 1970’s, neither DES nor GOST are practically broken by cryptanalysis.”
- “(We restrict this reassuring outlook to symmetric primitives, and acknowledge that spectacular failures can happen for more sophisticated constructions. An example is characteristic-2 supersingular curves’ fall from 128-bit to 59-bit security [32].)”
- “The speed of symmetric primitives being inversely proportional to their number of rounds, a natural yet understudied question is whether fewer rounds would be sufficient assurance against cryptanalysis’ progress.”
- “We conclude by proposing reduced-round versions of AES, BLAKE2, ChaCha, and SHA-3 that are significantly faster yet as safe as their full-round versions.”
- “in 2009 Bruce Schneier wrote this [51]:”
Cryptography is all about safety margins. If you can break n rounds of a cipher, you design it with 2n or 3n rounds. What we’re learning is that the safety margin of AES is much less than previously believed. And while there is no reason to scrap AES in favor of another algorithm, NIST should increase the number of rounds of all three AES variants. At this point, I suggest AES-128 at 16 rounds, AES-192 at 20 rounds, and AES-256 at 28 rounds. Or maybe even more; we don’t want to be revising the standard again and again.
- “128-bit security is often acknowledged as sufficient for most applications”
- “at the time of writing mining a Bitcoin block requires approximately 2 74 evaluations of SHA-256.”
- “Lloyd [46] estimated that ‘[the] Universe could currently register 1090 [or 2299] bits. To register this amount of information requires every degree of freedom of every particle in the Universe’. Applying a more general bound and the holographic principle, Lloyd further calculates that the observable Universe could register approximately 2399 bits by using all the information capacity matter, energy, and gravity.”
- “Unlike complexity theoretic estimates that use asymptotic notations such as O(n log n) where n is the problem size, cryptanalysts work with fixed-length values and can’t work with asymptotics.”
- “complexities in cryptanalysis papers ignore the fact that a memory access at a random address is typically orders of magnitude slower than simple arithmetic operations.”
- “the area-time (AT) metric model, where the attack cost is viewed as the product between area and time requirements”
- “complexities in cryptanalysis papers ignore the fact that a memory access at a random address is typically orders of magnitude slower than simple arithmetic operations.”
- “This example stresses that the area-time (AT) metric model, where the attack cost is viewed as the product between area and time requirements, is more realistic than the model where only time is considered”
- “Grigg and Gutmann called ‘cryptographic numerology’”
- “Using any standard commercial risk management model, cryptosystem failure is orders of magnitude below any other risk.”
- “For example, the greatest risks with e-voting systems are not the cryptographic protocols and key lengths, but the operational and information security concerns.”
- “Our proposed categories are:
- Analyzed: […]
- Attacked: […]
- Wounded: […]
- Broken: […]”
- “Schneier’s law is the tautological-sounding statement ‘Attacks always get better, they never get worse’, which Bruce Schneier popularized, and that (he heard) comes from inside the NSA.”
- “Rarely have number of rounds been challenged as too high. A possible reason (simplifying) is that people competent to constructively question the number of rounds have no incentive to promote faster cryptography, and that those who don’t have the expertise to confidently suggest fewer rounds.”
- “Although Grover reduces key search from O(2n) to O(2n/2), one shouldn’t ignore the constant factors hiding in the O(). Translating this asymptotic speed-up into a square-root of the actual cost is a gross oversimplification; between constant factors, the size and cost of a quantum circuit implementing the attacked primitive, the lack of parallelism [30], and the latency of the circuit, it’s actually unclear, given today’s quantum computing engineering knowledge, whether Grover would actually be more cost-efficient than classical computers. It’s nonetheless a safe bet to assume that it would be.”
- “Anyway, the number of rounds would not matter much would AES be Groverable, the answer to that question is therefore not important in choosing a number of rounds.”
- “[…] we propose the following:
- AES: 9 rounds instead of 10 for AES-128, 10 instead of 12 for AES-192, 11 instead of 14 for AES-256, yielding respectively a 1.1×, 1.2×, and 1.3× speed-up.
- BLAKE2: 8 rounds instead of 12 for BLAKE2b, 7 rounds instead of 10 for BLAKE2s (we’ll call these versions BLAKE2bf and BLAKE2sf), yielding respectively a 1.5× and 1.4× speed-up.
- ChaCha: 8 rounds instead of 20 (that is, ChaCha8), yielding a 2.5× speed-up.
- SHA-3: 10 rounds instead of 24 (we’ll call this version KitTen, inspired by Keccak family member KangarooTwelve), yielding a 2.4× speed-up.”
Good paper; its survey is its strong suit. However, the particular choice of proposed parameters has little justification
typo
If all the best cryptanalysts could find was a distinguisher in the chosen-ciphertext related-key model, then even less likely are practical key recovery attack in the chosen-plaintext model.
typo
But as as noted in §2, numbers such as
Toward decent text encoding §
Title: “Toward decent text encoding” by N. Holmes [url] [dblp]
Published in 1998 at and I read it in 2020-08
Abstract:
quotes
- “Now, without much public discussion or dispute, the computing industry seems to be moving to an equally poor but contrastingly obese character set called Unicode.”
- “In the 1960s, two expanded character sets came into wide use. When IBM introduced the 8-bit System/360 computers, it introduced an 8-bit character set called EBCDIC (Extended Binary Coded Decimal Interchange Code) to go with it.”
- “Both EBCDIC and ASCII provided users with a + symbol as standard, but (with breathtaking arrogance) the developers of both sets refused to provide the traditional multiplication and division symbols.”
- “What is disappointing, if not tragic, is that the replacement is so unsuitable for text encoding.”
- “Unicode seems to be trying to provide a single character set to represent documents in any language or writing system or mixture thereof.”
- “Unicode is intended primarily to allow the computing and telecommunications industry to get by with only one character set for the entire world (http://www.unicode.org). One result is that everyone has to use 16 bits for every character.”
- “Mudawwar’s Multicode aims to counter the 16-bit drawback and several others that he describes in some detail.”
- “Most traffic in text is raw text—messages, identifiers, business records—and the vast majority of this traffic is monolingual.”
- “Mudawwar’s Multicode scheme recognizes this and therefore provides for a separate character set for every ‘official language’ (‘Unicode Misunderstood,’
Computer, June 1997).” - “In this case Multicode provides for great data compression, but in any case it separates languages from one another, which is no longer the way of the world, if it ever was. There are two aspects of language interchange. First, languages borrow words and phrases from one another so that, for example, English uses French and German words and takes their diacritical marks with them.”
- “Second, in this international society it is important to be able to name people and organizations in their own language.”
- “I should be able to read all Swedish names in plaintext e-mail messages, but at present many are garbled.”
- “For text encoding, the world needs a standard for each writing system that suits each and every language using that system.”
- “The one exception is the traditional Chinese writing system, which encompasses thousands of distinct characters.”
summary
This article debates requirements for a decent text encoding scheme. It criticizes Unicode and argues in favor of Multicode.
Reading this 1998 article in 2021 certainly creates some issues. First and foremost, it is not understandable to me why the author equates Unicode to a 16-bit encoding (last page, left column). Unicode is called “obese character set” and 8-bit encodings are praised. The author neglects UTF-8 invented in 1992 without the “obese” property. His lack of understanding for writing systems is shown when describing “the traditional Chinese writing system” as “the one exception” “which encompasses thousands of distinct characters”. This statement excludes the CJK community which includes Kanji in Japanese text and Hanja in Hanguel (admittedly reduced to minority use since 1970 and limited to South Korea).
At the same time Holmes praises 8-bit encoding systems like the Multicode scheme, which makes computations like “convert to lowercase” difficult to implement (thus ignoring computational burden).
It seems to me the author did not properly research his topic of interest. But I agree upon the goal mentioned in the article:
“For text encoding, the world needs a standard for each writing system that suits each and every language using that system.”
Tweaks and Keys for Block Ciphers: The TWEAKEY Framewo… §
Title: “Tweaks and Keys for Block Ciphers: The TWEAKEY Framework” by Jérémy Jean, Ivica Nikolić, Thomas Peyrin [url] [dblp]
Published in 2014 at ASIACRYPT 2014 and I read it in 2020-06
Abstract: We propose the TWEAKEY framework with goal to unify the design of tweakable block ciphers and of block ciphers resistant to related-key attacks. Our framework is simple, extends the key-alternating construction, and allows to build a primitive with arbitrary tweak and key sizes, given the public round permutation (for instance, the AES round). Increasing the sizes renders the security analysis very difficult and thus we identify a subclass of TWEAKEY, that we name STK, which solves the size issue by the use of finite field multiplications on low hamming weight constants. We give very efficient instances of STK, in particular, a 128-bit tweak/key/state block cipher Deoxys-BC that is the first AES-based ad-hoc tweakable block cipher. At the same time, Deoxys-BC could be seen as a secure alternative to AES-256, which is known to be insecure in the related-key model. As another member of the TWEAKEY framework, we describe Kiasu-BC, which is a very simple and even more efficient tweakable variation of AES-128 when the tweak size is limited to 64 bits.
Properties of tweakable block ciphers:
- formalized in 2002 by Liskov et al.
- tweaks are completely public, keys are not
- retweaking (changing the tweak value) is less costly than changing its secret key
- security model considers that the attacks has full control over both: the message and the tweak inputs
quotes (on contributions)
- “We propose the TWEAKEY framework with goal to unify the design of tweakable block ciphers and of block ciphers resistant to related-key attacks.”
- “We give very efficient instances of STK, in particular, a 128-bit tweak/key/state block cipher Deoxys-BC that is the first AES-based ad-hoc tweakable block cipher.”
- “we describe Kiasu-BC, which is a very simple and even more efficient tweakable variation of AES-128 when the tweak size is limited to 64 bits.”
quotes (on history)
- “[…] designs that allowed to prove their security against classical differential or linear attacks have been a very important step forward, […]”
- “The security of the block ciphers, both Feistel and Substitution-Permutation networks, has been well studied when the key is fixed and secret, however, when the attacker is allowed to ask for encryption or decryption with different (and related) keys the situation becomes more complicated.”
- “Most key schedule constructions are ad-hoc, in the sense that the designers came up with a key schedule that is quite different from the internal permutation of the cipher, in a hope that no meaningful structure is created by the interaction of the two components.”
- “This extra input T, later renamed as tweak, was supposed to be completely public and to randomize the instance of the block cipher: to different values of T correspond different and independent families of permutations EK”
- “This feature was formalized in 2002 by Liskov et al., who showed that tweakable block ciphers are valuable building blocks if retweaking (changing the tweak value) is less costly than changing its secret key.”
- “disk encryption where each block is ciphered with the same key, but the block index is used as tweak value.”
- “Simple constructions of a tweakable block cipher EK(T, P) based on a block cipher EK(P), like XORing the tweak into the key input and/or message input, are not satisfactory. For example, only XORing the tweak into the key input would result in an undesirable property that EK(T, P) = EK ⊕ X(T ⊕ X, P).”
Non-intuitive results on birthday paradox bounds
- “More importantly, these methods ensure only security up to the birthday-bound (relative to the block cipher size).”
- “Minematsu [46] partially overcomes this limitation by proving beyond birthday-bound security for his design, but at the expense of a very
reduced efficiency.”
Future work
- “As of today, it remains an open problem to design an ad-hoc AES-like tweakable block cipher, which in fact would be very valuable for authenticated encryption as AES-NI instruction sets guarantee extremely fast software implementations.”
Tweakey
- “we emphasize that not all TWEAKEY instances are secure”
- “E is a key-alternating cipher when the general form f(si, Ki) = si+1 for i < r becomes f(si ⊕ Ki) = si+1”
- “The signature of standard block ciphers can be described as E: {0, 1}k ×{0, 1}n → {0, 1}n where an n-bit plaintext P is transformed into an n-bit ciphertext C = E(K, M) using a k-bit key K.”
- “The signature for a tweakable block cipher therefore becomes E : {0, 1}k × {0, 1}t × {0, 1}n → {0, 1}n, the ciphertext C = E(K, T, P ) where the tweak T does not need to be secret and thus can be placed in the public domain.”
- “It is important to note that the security model considers that the attacker has full control over both the message and the tweak inputs.”
- related-tweakey := related-key related-tweak
open-tweakey := open-key open-tweak - Figure 3 shows the TWEAKEY design, where the top wires transmit t+k bits, g outputs k bits and the bottom wires transmit n bits
- subtweakey extraction function g
- internal state update permutation f
- tweakey state update function h
- “This can be summarized as: si+1 = f(si ⊕ g(tki)) followed by tki+1 = h(tki)”
- “The functions f, g and h must be chosen along with the number of rounds r such that no known attack can apply on the resulting primitives.”
- “One of the main causes for the low number of ad-hoc tweakable block ciphers is the fact that adding a tweak input makes the security analysis much harder.”
- “The trick we use is to apply a nibble-wise multiplication with a distinct coefficient α j for all tweakey words.”
- “when we deal with differences in several tweakey words (which is supposedly very hard to analyze due to the important number of nibbles), the study of the STK construction is again the same as for a classical TK-1 analysis, except that at most p − 1 active output nibbles can be erased in each subgroup.”
- Figure 4 shows the STK construction
- “The chosen round functions (and the nibble sizes), suggest that Deoxys-BC is software oriented, while Joltik-BC is hardware (and lightweight) oriented design.”
- “For instance, XORing two columns (instead of rows) would immediately lead to an insecure variant.” (context Kiasu-BC)
Performance and area
- “a complete 128-bit tweak 128-bit key 128-bit block cipher proposal Deoxys-BC based on the AES round function, but faster and more lightweight than other tentatives to build a tweakable block cipher from AES-128. When used in ΘCB3 [38] authenticated encryption, Deoxys-BC runs at about 1.3 c/B on the latest Intel processors. This has to be compared to OCB3, which runs at 0.7-0.88 c/B when instantiated with AES-128, but only ensures birthday-bound security. Alternatively, Deoxys-BC could be a replacement for AES-256,
which has related-key issues as shown in [8].” - “On longer inputs and modes based on parallelizable block cipher calls (such as ΘCB3), Deoxys-BC-256 runs at around 1.3 cycles per byte, while Deoxys-BC-384 at around 1.55 cycles per byte. This is to be compared to AES in OCB3 mode, which runs at around 0.70 - 0.88 cycles per byte (but has only birthday bound security).”
- “Therefore, we estimate that the entire Deoxys-BC-256 can be implemented with around 3400 GE, and Deoxys-BC-384 with around 4400 GE.”
summary
I think this paper is some good research product. I am not an expert on symmetric cryptography and cannot judge upon the security analysis and possible attacks, but to me it seems to consider relevant properties. Unrelated to the paper, I was not aware of beyond-birthday-attack-security which totally intrigued me. Related to the paper, follow-up work could be made regarding the question “What are sufficient conditions to achieve a secure tweakable block cipher with Tweakey?”. Well done!
- Tweakey framwork
- Kiasu-BC (tweak size = 64 bits, AES-128 based)
- STK (goal: ease of cryptanalysis with existing tools)
- Deoxys-BC (n=128 bit blocks input, f = AES round function)
- Deoxys-BC-256
- Deoxys-BC-384
- Joltik-BC (n = 64 bits, f = AES-like using 4-bit nibbles)
- Joltik-BC-128 (r = 24 rounds)
- Joltik-BC-192 (r = 32)
- Deoxys-BC (n=128 bit blocks input, f = AES round function)
- ““The QARMA Block Cipher Family’ by Roberto Avanzi” is a follow-up on this work
typo
- “these scheme might not be really efficient” → “these schemes might not be really efficient”
- “This might be seen as counter intuitive as it is required the tweak input to be somehow more efficient than the key input,” → “This might be seen as counter intuitive as it requires the tweak input to be somehow more efficient than the key input,”
- “but at the same time the security requirement on the tweak seem somehow stronger than on the key,” → “but at the same time the security requirement on the tweak seems somehow stronger than on the key,”
- “and then multiply each c-bit cell of the j-th” → and then multiplies each c-bit cell of the j-th
- “Most automated differential analysis tools for AES-like ciphers (e.g., [9, 23,28]) use truncated differential
representation to make feasible the search for differential characteristics.” → “Most automated differential analysis tools for AES-like ciphers (e.g., [9, 23,28]) use truncated differential representation to make the search for differential characteristics feasible.”
Underproduction: An Approach for Measuring Risk in Ope… §
Title: “Underproduction: An Approach for Measuring Risk in Open Source Software” by Kaylea Champion, Benjamin Mako Hill [url] [dblp]
Published in 2021-02 at SANER 2021 and I read it in 2022-04
Abstract: The widespread adoption of Free/Libre and Open Source Software (FLOSS) means that the ongoing maintenance of many widely used software components relies on the collaborative effort of volunteers who set their own priorities and choose their own tasks. We argue that this has created a new form of risk that we call ‘underproduction’ which occurs when the supply of software engineering labor becomes out of alignment with the demand of people who rely on the software produced. We present a conceptual framework for identifying relative underproduction in software as well as a statistical method for applying our framework to a comprehensive dataset from the Debian GNU/Linux distribution that includes 21,902 source packages and the full history of 461,656 bugs. We draw on this application to present two experiments: (1) a demonstration of how our technique can be used to identify at-risk software packages in a large FLOSS repository and (2) a validation of these results using an alternate indicator of package risk. Our analysis demonstrates both the utility of our approach and reveals the existence of widespread underproduction in a range of widelyinstalled software components in Debian.
Lehman’s laws of software evolution
(via “Laws of Software Evolution Revisited” by M M Lehman)
“All relate specifically to E-type systems that is, broadly speaking, to software systems that solve a problem or implement a computer application in the real world”:
- Continuing Change: An E-type program that is used must be continually adapted else it becomes progressively less satisfactory.
- Increasing Complexity: As a program is evolved its complexity increases unless work is done to maintain or reduce it.
- Self Regulation: The program evolution process is self regulating with close to normal distribution of measures of product and process attributes.
-
Conservation of Organisational Stability: The average effective global activity rate on an evolving system is invariant over the product
life time. - Conservation of Familiarity: During the active life of an evolving program, the content of successive releases is statistically invariant
- Continuing Growth: Functional content of a program must be continually increased to maintain user satisfaction over its lifetime.
- Declining Quality: E-type programs will be perceived as of declining quality unless rigorously maintained and adapted to a changing operational environment.
- Feedback System: E-type Programming Processes constitute Multi-loop, Multi-level Feedback systems and must be treated as such to be successfully modified or improved.
peer production
… as defined by Yochai Benkler (in the 1990s):
- decentralized goal setting and execution
- a diverse range of participant motives, including non-financial ones
- non-exclusive approaches to poverty (e.g. copyleft or permissive licensing)
- governance through participation, notions of meritocracy, and charisma (rather than through property or contract)
quotes
- “The widespread adoption of Free/Libre and Open Source Software (FLOSS) means that the ongoing maintenance of many widely used software components relies on the collaborative effort of volunteers who set their own priorities and choose their own tasks. We argue that this has created a new form of risk that we call ‘underproduction’ which occurs when the supply of software engineering labor becomes out of alignment with the demand of people who rely on the software produced. We present a conceptual framework for identifying relative underproduction in software as well as a statistical method for applying our framework to a comprehensive dataset from the Debian GNU/Linux distribution that includes 21,902 source packages and the full history of 461,656 bugs.” (Champion and Hill, 2021, p. 1)
- “In this paper, we describe an approach for identifying other important but poorly maintained FLOSS packages” (Champion and Hill, 2021, p. 1)
-
“In an early and influential practitioner account, Raymond argued that FLOSS would reach high quality through a process he dubbed ‘Linus’ law’ and defined as ‘given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone’ [5]. Benkler coined the term ‘peer production’ to describe the method through which many small contributions from large groups of diversely motivated individuals could be integrated together into high quality information goods like software [6].
A growing body of research suggests reasons to be skeptical about Linus’ law [7] and the idea that simply opening the door to one’s code will attract a crowd of contributors [8, 9].” (Champion and Hill, 2021, p. 1) - “Over time, it has become clear that peer produced FLOSS projects’ reliance on volunteer labor and self-selection into tasks has introduced types of risk that traditional software engineering processes have typically not faced. Foremost among these is what we call ‘underproduction.’ We use the term underproduction to refer to the fact that although a large portion of volunteer labor is dedicated to the most widely used open source projects, there are many places where the supply of quality software and volunteer labor is far out of alignment with demand. Because underproduction may go unnoticed or unaddressed until it is too late, we argue that it represents substantial risk to the stability and security of software infrastructure.” (Champion and Hill, 2021, p. 2)
- “How can we measure underproduction in FLOSS?” (Champion and Hill, 2021, p. 2)
- “Our paper contributes to software engineering research in three distinct ways. First, we describe a broad conceptual framework to identify relative underproduction in peer produced FLOSS repositories: identifying software packages of lower relative quality than one would expect given their relative popularity. Second, we describe an application of this conceptual framework to a dataset of 21,902 source packages from the Debian GNU/Linux distribution using measures derived from multilevel Bayesian regression survival models. Finally, we present results from two experiments. The first experiment identifies a pool of relatively underproduced software in Debian. The second experiment seeks to validate our application of our framework for identifying underproduction by correlating underproduction with an alternate indicator of risk.” (Champion and Hill, 2021, p. 2)
- “FLOSS began with the free software movement in the 1980s and its efforts to build the GNU operating system as a replacement for commercial UNIX operating systems [23]. Over time, free software developers discovered that their free licenses and practices of working openly supported new forms of mass collaboration and bug fixes [4].” (Champion and Hill, 2021, p. 2)
- “‘Peer production’ is a term coined by Yochai Benkler to describe the model of organizing production discovered by FLOSS communities in the early 1990s that involved the mass aggregation of many small contributions from diversely motivated individuals. Benkler [9] defines peer production for online groups in terms of four criteria: (1) decentralized goal setting and execution, (2) a diverse range of participant motives, including non-financial ones, (3) non-exclusive approaches to property (e.g. copyleft or permissive licensing), and (4) governance through participation, notions of meritocracy, and charisma, rather than through property or contract.” (Champion and Hill, 2021, p. 2)
- “The process of building and maintaining software is often collaborative and social, including not only code but code comments, commit messages, pull requests, and code reviews, as well as bug reporting, issue discussing, and shared problem-solving [24].” (Champion and Hill, 2021, p. 2)
- “The team found that the laws are frequently not upheld in FLOSS, especially when effort from outside a core team is considered. This work suggests that the effort available to maintain a piece of FLOSS software may increase as it grows in popularity.” (Champion and Hill, 2021, p. 3)
- “Prior studies have suggested that bug resolution rate is closely associated of a range of important software engineering outcomes, including codebase growth, code quality, release rate, and developer productivity [32, 33, 34]. By contrast, lack of maintenance activity as reflected in a FLOSS project’s bug tracking system can be considered a sign of failure [35].” (Champion and Hill, 2021, p. 3)
- “In particular, we are inspired by a study of Wikipedia by Warncke-Wang et al. [36] who build off previous work by Gorbatˆ ai [37] to formalize what Warncke-Wang calls the “perfect alignment hypothesis” (PAH). The PAH proposes that the most heavily used peer produced information goods (for Warncke-Wang et al., articles in Wikipedia) will be the highest quality, that the least used will be the lowest quality, and so on. In other words, the PAH proposes that if we rank peer production products in terms of both quality and importance—for example, in the simple conceptual diagram shown in Figure 1—the two ranked lists will be perfectly correlated.” (Champion and Hill, 2021, p. 3)
- “Despite the central role that FLOSS plays in peer production, we know of no efforts to conceptualize or measure underproduction in software.” (Champion and Hill, 2021, p. 3)
- “A low quality Wikipedia article on an extremely popular subject seems likely to pose much less risk to society than a bug like the Heartbleed vulnerability described earlier which could occur when FLOSS is underproduced.” (Champion and Hill, 2021, p. 3)
- “The measure of deviation resulting from this process serves as our measure of (mis-)alignment between quality and importance (i.e., over- or underproduction).” (Champion and Hill, 2021, p. 3)
- “With a community in operation since 1993, Debian is widely used and is the basis for other widelyused distributions like Ubuntu. Debian had more than 1,400 different contributors in 20201 and contains more than 20,000 of the most important and widely used FLOSS packages.” (Champion and Hill, 2021, p. 4)
- “A single source package may produce many binary packages. For examples, although it is an outlier, the Linux kernel source package produces up to 1,246 binary packages from its single source package (most are architecture specific subcollections of kernel modules).” (Champion and Hill, 2021, p. 4)
- “However, software engineering researchers have noted that the quantity of bugs reported against a particular piece of FLOSS may be more related to the number of users of a package [50, 52], or the level of effort being expended on bug-finding [1] in ways that limit its ability to serve as a clear signal of software quality. In fact, Walden [1] found that OpenSSL had a lower bug count before Heartbleed than after. Walden [1] argued that measures of project activity and process improvements are a more useful sign of community recovery and software quality than bug count.” (Champion and Hill, 2021, p. 4)
- “Time to resolution has been cited as an important measure of FLOSS quality by a series of software engineering scholars” (Champion and Hill, 2021, p. 4)
- “A second challenge in measuring quality as time to resolution comes from the fact that the distribution of bugs across packages is highly unequal. Most of the packages we examine (14,604 of 21,902) have 10 or fewer bugs and more than one out of six (3,857 of 21,902) have only one bug reported.” (Champion and Hill, 2021, p. 5)
- “Given this construction, Uj will be zero when a package is fully aligned, negative if it is overproduced, and positive if it is underproduced.” (Champion and Hill, 2021, p. 6)
- “Our first experiment describes results from the application of our method described in §V and suggests that a minimum of 4,327 packages in Debian are underproduced.” (Champion and Hill, 2021, p. 6)
- “Underproduction is a concept borrowed from economics and involves a relationship between supply and demand.” (Champion and Hill, 2021, p. 8)
- “For example, resolution time is an imperfect and partial measure of quality.” (Champion and Hill, 2021, p. 8)
- “Our results suggest that underproduction is extremely widespread in Debian. Our non-parametric survival analysis shown in Figure 2 suggests that Debian resolves most bugs quickly and that release-critical bugs in Debian are fixed much more quickly than non-release-critical bugs. The presence of substantial underproduction in widely-installed components of Debian exposes Debian’s users to risk.” (Champion and Hill, 2021, p. 9)
- “One striking feature of our results is the predominance of visual and desktop-oriented components among the most underproduced packages (see Figure 5). Of the 30 most underproduced packages in Debian, 12 are directly part of the XWindows, GNOME, or KDE desktop windowing systems. For example, the “worst” ranking package, GNOME Power Manager (gnome-power-manager) tracks power usage statistics, allows configuration of power preferences, screenlocking, screensavers, and alerts users to power events such as an unplugged AC adaptor.” (Champion and Hill, 2021, p. 9)
- “These results might simply reflect the difficulty of maintaining desktop-related packages. For example, maintaining gnomepower-manager includes complex integration work that spans from a wide range of low-level kernel features to high-level user-facing and usability issues.” (Champion and Hill, 2021, p. 9)
- “FLOSS acts as global digital infrastructure. Failures in that infrastructure ripple through supply chains and across sectors.” (Champion and Hill, 2021, p. 9)
summary
The paper defines the notion of underproduction from economics for software projects. Roughly the notion captures the imbalance between activity/attention of open source packages in relation to demand (values below 1, in particular). To empirically quantify underproduction, they looked at the time for bug resolution versus installments of 21,902 Debian packages. 4,327 packages have been identified as underproduced (about 20%).
All decisions are made with proper rationale and limitations are discussed in section 7. The normalization of data, in particular assignment of bugs to packages through BTS, must have been taken a large efforts. However, my confidence in the statistical model is rather low (for example, I am not sure a uniform model for packages with such diverging bug reporting property - as explained on page 5 - is appropriate). A ‘control group’ with commercial software projects would be nice, but is obviously infeasible. I would like to point out that this is purely subjective and I cannot support this with a formal statement since my empirical background is very small.
The paper is well-written except for the screwup on page 5.
The result, that the worst underproduced applications are GUI applications, is interesting.
Vnodes: An Architecture for Multiple File System Types… §
Title: “Vnodes: An Architecture for Multiple File System Types in Sun UNIX” by S R Kleiman [url] [dblp]
Published in 1986 at USENIX summer 1986 and I read it in 2023-07
Abstract:
quotes
- “vfs_sync(vfsp): Write out all cached information for vfsp.Note that this isnot necessarily done synchronously. When the operation returns all data has not necessarily been written out, however ithas been scheduled.” (Kleiman, 1986, p. 7)
- “The current interface has been in operation since the summer of 1984, and isareleased Sun product.” (Kleiman, 1986, p. 9)
- “Vnodes has been proven to provide aclean, well defined interface to different file system implementations.” (Kleiman, 1986, p. 9)
- “In addition, a prototype "/proc" file system[5] has been implemented.” (Kleiman, 1986, p. 9)
summary
In this paper, the author S. R. Kleiman explains the filesystem interface developed for Sun UNIX. The basic idea is to have a uniform vfs and vnodes interface with C structs which is implemented for every filesystem. The interface with its proposed semantics is presented.
Very nice to see a white paper on such fundamental software architecture.
I am not familiar with filesystem requirements, but the interface, examples, and some rationale is provided. I was surprised fsync is not synchronous. The fid structure was not understandable to me either (does the unique file ID have a length of 1 byte).
typo
page 6, “the file pointer is is changed”
When a Patch is Not Enough - HardFails: Software-Explo… §
Title: “When a Patch is Not Enough - HardFails: Software-Exploitable Hardware Bugs” by Ghada Dessouky, David Gens, Patrick Haney, Garrett Persyn, Arun Kanuparthi, Hareesh Khattri, Jason M. Fung, Ahmad-Reza Sadeghi, Jeyavijayan Rajendran [url] [dblp]
Published in 2018-12-01 at and I read it in 2020-07
Abstract: Modern computer systems are becoming faster, more efficient, and increasingly interconnected with each generation. Consequently, these platforms also grow more complex, with continuously new features introducing the possibility of new bugs. Hence, the semiconductor industry employs a combination of different verification techniques to ensure the security of System-on-Chip (SoC) designs during the development life cycle. However, a growing number of increasingly sophisticated attacks are starting to leverage cross-layer bugs by exploiting subtle interactions between hardware and software, as recently demonstrated through a series of real-world exploits with significant security impact that affected all major hardware vendors.
HCF
“A behavior humorously hinted at in IBM System/360 machines in the form of a Halt-and-Catch-Fire (HCF) instruction.”
→ See also Christopher Domas' research on x86 since 2017
→ Pentium F00F (C7C8) bug
quotes
- “This approach does not ensure security at the hardware implementation level. Hardware vulnerabilities can be introduced due to: (a) incorrect or ambiguous security specifications, (b) incorrect design, (c) faulty implementation of the design, or (d) a combination thereof.”
- “To detect such bugs, the semiconductor industry makes extensive use of a variety of verification and analysis techniques, such as simulation and emulation (also called dynamic verification)”
- “industry-standard tools include Incisive, Solidify, Questa Simulation and Questa Formal, OneSpin 360, and JasperGold”
- “This process incorporates a combination of many different techniques and toolsets such as RTL manual code audits, assertion-based testing, dynamic simulation, and automated security verification.”
- “recent outbreak of cross-layer bugs” with 15 reference appended ^^
- “To reproduce this effect, we implemented the list of bugs using two popular and freely available processor designs for the widely used open-source RISC-V architecture.”
- “Specifically, we observe that RTL bugs arising from complex and cross-modular interactions in real-world SoCs render RTL bugs extremely difficult to detect in practice. Further, it may often be feasible to exploit them from software to compromise the entire platform, and we call such bugs HardFails.”
- “As all vendors keep their proprietary industry designs and implementations inaccessible, we use the popular open-source RISC-V architecture and hardware micro-architecture as a baseline”
- “We investigated how these vulnerabilities can be effectively detected using formal verification techniques (Section V) using an industry-standard tool and in a second case study through simulation and manual RTL analysis (Section VI).”
- “As a result, real-world SoCs can easily approach 100,000 lines of RTL code, and some open-source designs significantly outgrow this to many millions lines of code”
- “However, since RTL code is usually compiled and hardwired as integrated circuitry logic, the underlying bugs will remain and cannot, in principle, be patched after production. This is why RTL bugs pose a severe security threat in practice.”
- “We call these the HardFail properties of a bug:”
- “Cross-modular effects (HF-1).” […]
- “Timing-flow gap (HF-2).” […] “In practice, this leads to vast sources of information leakage due to software-exploitable timing channels (see Section IX).” […]
- “Cache-state gap (HF-3).” […] “In particular, current tools reason about the architectural state of a processor by exclusively focusing on the state of registers. However, this definition of the architectural state completely discards that modern processors feature a highly complex microarchitecture and diverse hierarchy of non-register caches. This problem is amplified as these caches have multiple levels and shared across multiple privilege levels. Caches represent a state that is influenced directly or indirectly by many control-path signals.” […]
- “Hardware/firmware interactions (HF-4).” […] “Hence, reasoning on whether an RTL bug exists is inconclusive when considering the hardware RTL in isolation.” […]
- “On analyzing the RTL of Ariane, we observed that TLB page faults due to illegal accesses occur in a different number of clock cycles than page faults that occur due to unmapped memory (we contacted the developers and they acknowledged the vulnerability).”
- “Once the instruction is retired, the execution mode of the core is changed to the unprivileged level, but the entries that were prefetched into the cache (at the system privilege level) do not get flushed.”
- “We emphasize that in a real-world security testing (see Section II), engineers will not have prior knowledge of the specific vulnerabilities they are trying to find. Our goal, however, is to investigate how an industry-standard tool can detect RTL bugs that we deliberately inject in an open-source SoC and have prior knowledge of (see Table I).”
- “Our results in this study are based on two formal techniques: Formal Property Verification (FPV) and Security Path Verification (SPV).”
- “To describe our assertions correctly, we examined the location of each bug in the RTL and how it is manifested in the behavior of the surrounding logic and input/output relationships. Once we specified the security properties using assert, assume and cover statements, we determined which RTL modules we need to model to prove these assertions.”
- “Out of the 31 bugs we investigated, shown in Table I, using the formal verification techniques described above, only 15 or 48%, were detected. While we tried to detect all 31 bugs formally, we were only able to formulate security properties for only 17 bugs. This indicates that the main challenge with using formal verification tools is identifying and expressing security properties that the tools are capable of capturing and checking.”
- “Our results, shown in the SPV and FPV bars of Figure 3, indicate that integer overflow and address overlap bugs had the best detection rates, 80% and 100%, respectively.”
- “The implications of these findings are especially grave for real-world more complex SoC designs where these bug classes are highly relevant from a security standpoint.”
- “we replaced the PULP_SECURE variable, which controls access privileges to the registers, with the PULP_SEC variable.”
- “We present next the results of our second case study. 54 teams of researchers participated in Hack@DAC 2018, a recently conducted capture-the-flag competition to identify hardware bugs that were injected deliberately in real-world open-source SoC designs. This is the equivalent of bug bounty programs that semiconductor companies offer”
- “The goal is to investigate how well these bugs can be detected through dynamic verification and manual RTL audit without prior knowledge of the bugs.”
- “This RTL vulnerability manifests in the hardware behaving in the following way. When an error signal is generated on the memory bus while the underlining logic is still handling an outstanding transaction, the next signal to be handled will instead be considered operational by the module unconditionally.”
- “While existing industry SoCs support hot-fixes by microcode patching, this approach is inherently limited to a handful of changes to the instruction set architecture, e.g., modifying the interface of individual complex instructions and adding or removing instructions. Thus, such patches at this higher abstraction level in the firmware only act as a "symptomatic" fix that circumvent the RTL bug.”
- “VeriCoq based on the Coq proof assistant transforms the Verilog code that describes the hardware design into proof-carrying code.”
- “Finally, computational scalability to verifying real-world complex SoCs remains an issue given that the proof verification for a single AES core requires 30 minutes to complete”
- “Murφ model”
- “Information flow analysis (such as SPV) are better suited for this purpose where a data variable or input is assigned a security label (or a taint), and the taint propagation is monitored.”
- “IFT techniques are proposed at different levels of abstraction: gate-, RT, and language-levels.”
- “At the language level, Caisson and Sapper are security-aware HDLs that use a typing system where the designer assigns security “labels” to each variable (wire or register) by the security policies required. However, they both require redesigning the RTL using a new hardware description language which is not practical. SecVerilog [33, 100] overcomes this by extending the Verilog language with a dynamic security type system.”
- “In the Meltdown attack, speculative execution can be exploited on modern processors (affecting all main vendors) to completely bypass all memory access restrictions.”
- “a selection multiplexer to select between AES, SHA1, MD5, and the temperature sensor.”
summary
A technical report discussing how bugs are introduced in the hardware design process and slip through testing tools. Specifically, they define HardFails as RTL bugs that are difficult to detect and can be triggered from software potentially compromising the entire platform. Their classification in 4 HardFails properties {Cross-modular effects, Timing-flow gap, Cache-state gap, Hardware/Firmware-interactions} is non-exhaustive (as pointed out in the conclusion). In my opinion, it is too copious when discussing the hardware development process and laying out the advantages/disadvantages of various tools. I think it could have been more concise (e.g. in “Proof assistant and theorem-proving” the scalability issue is mentioned twice).
Besides that, I think it give a nice overview over the issues hardware design has to deal with and yes, we need better tool support. But there is (obviously) no solution to the mentioned problems in the paper.
Catherine pointed out that CLKScrew in Table 2 does not need Firmware interaction and Foreshadow has nothing to do with Firmware interaction.
You Really Shouldn't Roll Your Own Crypto: An Empirica… §
Title: “You Really Shouldn't Roll Your Own Crypto: An Empirical Study of Vulnerabilities in Cryptographic Libraries” by Jenny Blessing, Michael A. Specter, Daniel J. Weitzner [url] [dblp]
Published in 2021-07 at and I read it in 2021-10
Abstract: The security of the Internet rests on a small number of opensource cryptographic libraries: a vulnerability in any one of them threatens to compromise a significant percentage of web traffic. Despite this potential for security impact, the characteristics and causes of vulnerabilities in cryptographic software are not well understood. In this work, we conduct the first comprehensive analysis of cryptographic libraries and the vulnerabilities affecting them. We collect data from the National Vulnerability Database, individual project repositories and mailing lists, and other relevant sources for eight widely used cryptographic libraries.
quotes
- “We examine eight of the most widely used cryptographic libraries and build a dataset of the 300+ entries in the National Vulnerability Database (NVD) [55] for these systems. In our analysis, we combine data from the NVD with information scraped from the projects’ GitHub repositories, internal mailing lists, project bug trackers, and various other external references. We extensively characterize the vulnerabilities originating in cryptographic software, measuring exploitable lifetime, error type, and severity to better understand their security impact on cryptographic software.”
- “Our findings include the following: just 27.2% of vulnerabilities in cryptographic software are cryptographic issues as defined by the NVD, while 37.2% of errors are related to memory management or corruption, suggesting that developers should focus their efforts on systems-level implementation issues. The median exploitable lifetime of a vulnerability in a cryptographic library is 4.18 years, providing malicious actors a substantial window of exploitation. At least one vulnerability is introduced for every thousand lines of code added in the most widely used cryptographic library, OpenSSL; and the rate of vulnerability introduction is up to three times as high in cryptographic software as in non-cryptographic software.”
- “When a new CVE (Common Vulnerabilities and Exposures) ID [33] is created, the NVD calculates a severity score (CVSS) [56] and performs additional analysis before adding the vulnerability to the NVD.”
- “We scrape CVE data from two third-party platforms, CVE Details [1] and OpenCVE [2], which contain much the same data as the official NVD but organize CVEs by product and vendor, enabling us to retrieve all CVEs for a particular system.”
- […] “and so we include data scraped from both CVE Details and OpenCVE.”
- “In total, our dataset consists of n=312 CVEs in cryptographic libraries and 2,000+ CVEs in non-cryptographic software. In our study of cryptographic vulnerability characteristics, we consider only CVEs published by the NVD between 2010 and 2020, inclusive.”
- “Prior work [40] has demonstrated significant differences in vulnerability causes in memory-unsafe C/C++ source code compared to systems written in memory-safe languages such as Java.”
- “In cryptographic software, we consider only cryptographic libraries that have at least 10 CVEs published from 2010 - 2020.”
- “We define cyclomatic complexity as the number of linearly independent paths through a system’s source code, following McCabe’s 1976 definition [45].”
- “We use a separate command-line tool, lizard [63], to calculate cyclomatic complexity of all C and C++ source files. Lizard calculates the complexity of each file individually and averages them together, outputting a single average cyclomatic complexity number (CCN).”
- “We therefore define a vulnerability’s lifetime as the period of time in which it can be exploited by a malicious actor.”
- “we observe that OpenSSL has a far greater number of CVEs than any other cryptographic library, with 153 CVEs published during our timeframe of 2010 - 2020 compared to the second-highest count of 43 CVEs in GnuTLS.”
- “Column 7 shows that, on average, around 1 CVE is introduced in OpenSSL for every thousand lines of code added.”
- “LibreSSL was conceived of as a replacement for OpenSSL that maintained prior API compatibility and portability [9], while BoringSSL was developed for internal Google use only.”
- “The OpenBSD team built LibreSSL under the design that the library would only be used on a POSIX-compliant OS with a standard C compiler [59, 19].”
- “The abrupt jettisoning of 22% and 70% of OpenSSL’s codebase by LibreSSL and BoringSSL, respectively, raises the question of what impact this had on the security of the two new codebases.”
- “Table 5 shows that of those 59 CVEs, 44 still affected LibreSSL and just 35 affected BoringSSL. The clear correspondence between the percentage of the OpenSSL codebase removed and the percentage of OpenSSL vulnerabilities removed demonstrates the security implications for reducing codebase size.”
- “Table 6 shows the average cyclomatic complexities over the previous five major versions for each of the eight cryptographic libraries studied and the three non-cryptographic systems selected.”
- “For Ubuntu, of the 2,187 CVEs studied the average and median lifetimes are 3.89 and 4.03 years, with a standard deviation of 1.44 years. Of the 509 CVEs in Wireshark, the average and median lifetimes are 1.29 and 1.4 years, with a standard deviation of 0.61.”
- “approximately three out of every four vulnerabilities in cryptographic software are caused by common implementation errors, and particularly by memory management issues, overly bloated software threatens significant implications for library security.”
- “However, cryptography libraries suffer from serious usability issues that make them challenging for non-specialists to navigate.”
- “We found that only 27.2% of vulnerabilities introduced in cryptographic software are actually cryptographic, while 37.2% are memory or resource management issues.”
summary
Well-designed methodology. All decisions made are well-documented and put into context. I am just still a little bit skeptical about the statement “You really shouldn't roll your own crypto”. The data shows that cryptographic bugs occur and cryptographic libraries should prevent errors on an API level. However, it does not really show results regarding “own crypto”. It does not provide examples for custom-designed cryptographic libraries or its effects in industry.
Prior work:
- Ozment et al [48]
- Zimmermann et al [64]
- Azad et al [29]
- Lazar et al [42]
- Walden et al [61]
- Li et al [44]
- Shahzad et al [53]
- Rescorla et al [50]
π is the Minimum Value for Pi §
Title: “π is the Minimum Value for Pi” by C. L. Adler, James Tanton [url] [dblp]
Published in 2000 at and I read it in 2021-10
Abstract:
summary
If we consider Pi as the ratio of the circumference to its diameter, the value depends on the chosen metric. Varying this metric gives various values from 4 to π and to 4 again. Thus π is the minimum value of Pi. An analytic proof follows. Nice little result. Also attached is a proof that a² + b² ≥ 2ab. If the bright grey area equals a・b and the dark grey area equals a・b as well, then we can discover the area of 2・a・b. If the dark grey area outside the large square is a² and the large square b², we can observe that a² + b² ≥ 2ab because of the white square on the top.