Too much crypto

Motivation

The title of this blog post is based on a paper by Jean-Philippe Aumasson “Too Much Crypto”, which he published on 2019-12-31. Following block quotes will be from his paper (unless stated otherwise). Whereas JP Aumasson discusses the security levels of various schemes triggering an academic discussion, I want to give only a short summary. My notes from reading this paper read “Good paper; its survey is its strong suit. However, the particular choice of proposed parameters has little justification”. But this is difficult. In general, justification of a parameter set is heavily debated and difficult to argue scientifically. With round 3 of the NIST Post-Quantum Cryptography Standardization process a few days ago, djb also raised questions regarding the security claims of NewHope in comparison to CRYSTALS-Kyber (to illustrate the liveliness of this topic).

The question raised is which security level shall be picked for cryptographic primitives.

In general, do not invent your own crypto. Cryptography is a fast-moving field and it cannot be expected by programmers to keep up. It is too easy to get a false sense of security. Thus, standardization of cryptographic algorithms is important and also communicates the parameter set of the algorithm. The parameter set defines the attained security level. Brute-force search is a generic attack on any cryptographic algorithm. Hence, we use the worst-case number of required trials to determine the secret key as measurement for the security level. Of course it depends on the runtime of the underlying computation, but a laymen estimate is that anything requiring more than 2¹⁰⁰ trials tends to be secure (this is often more accurate for symmetric crypto unlike asymmetric crypto).

128-bit security is often acknowledged as sufficient for most applications

So consider a new algorithm. Shall we standardize parameters to provide a security level of 2¹²⁸ or 2¹⁶⁰?

Two positions

One position is that the security level shall be rather high. The following paragraph is taken from the FrodoKEM specification (section 2.1):

Given the high cost and slow deployment of entirely new cryptographic systems, the desired decades-long lifetime of such systems, and the unpredictable trajectory of quantum computing technology and quantum cryptanalysis over the coming years, we argue that any post-quantum standard should follow a conservative approach that errs comfortably on the side of security and simplicity over performance and (premature) optimization.

The opposite position is what JP Aumasson argues in his paper:

A main point from this article piece is that, if performance matters, the number of rounds of a primitive shouldn’t be picked randomly as whatever value is sufficiently high. We’ve also challenged the fact that numbers of rounds are never revised (decreased or increased) based on new information and research. We believe that picking a number of rounds should ideally not only be left to cryptographers, but also involve data scientists and risk management experts.

Rarely have number of rounds been challenged as too high.

Do we spend too much time computing cryptographic primitives without measurable practical security?

Introduction

(Quotes from the paper)

Designed in the 1970’s, neither DES nor GOST are practically broken by cryptanalysis.
The speed of symmetric primitives being inversely proportional to their number of rounds, a natural yet understudied question is whether fewer rounds would be sufficient assurance against cryptanalysis’ progress.
Lloyd estimated that “[the] Universe could currently register 10⁹⁰ [or 2²⁹⁹] bits. To register this amount of information requires every degree of freedom of every particle in the Universe.”. Applying a more general bound and the holographic principle, Lloyd further calculates that the observable Universe could register approximately 2³⁹⁹ bits by using all the information capacity matter, energy, and gravity.

Some attacks

Attacks on AES (Advanced Encryption Standard):

2018: Given capabilities to query for the ciphertexts of three million plaintext blocks and storage and R/W-access for 46 mebibytes in memory, 5 AES rounds were broken with 2^21.5 evaluations
2013: Given 2⁹⁷ chosen plaintexts and 2¹⁰⁰ bytes of storage, 7 AES rounds of AES-128 were broken with 2⁹⁹ encryption operations

→ Aumasson’s comment: “Between 1998 and 2019 attacks got better, but not considerably: in 1998 the initial AES/Rijndael submission document described a generic attack on 6 rounds with 2⁷² complexity and 2³⁴ data. Today 7 rounds can be attacked in time 2¹⁴⁶, which remains highly impractical.”

Attacks on BLAKE2:

2015: “pseudo-preimage” attacks on 7.5 rounds of BLAKE2b with 2^510.3 operations and on 6.75 rounds of BLAKE2s with 2^253.8

→ Aumasson’s comment: “There was no meaningful progress since the differential analysis of BLAKE in 2011”

Attacks on ChaCha:

2016: key recovery attack on 7-round version, with 2^237.7 time complexity using output data from 2⁹⁶ instances of ChaCha

→ Aumasson’s comment: “Between 2008 and 2019, the estimated complexity of an attack went from 2²⁴⁸ to 2²³⁵, using the same technique but refined analysis.”

Attacks on SHA-3:

2019: practical collision attacks on 5-round SHA3-224 implemented on GPUs with a running time of 40h; 473h (20 days) for 5-round SHA3-256

About SHA-256:

Aumasson’s comment: “Cryptanalysts seem to have given up on looking for collision attacks. The much higher non-linearity, compared to SHA-1, is indeed intimidating and unlikely (in our opinion) to be worked around.”

Previous work

Especially SHA-3 is criticized for its high number of rounds. Therefore alternative proposals were made:

One of the members of the Keccak family, “KangarooTwelve”, reduces the number of rounds from 24 to 12, commenting that “clearly, 12 rounds provide less safety margin than the full 24 rounds (…). Still, the safety margin provided by 12 rounds is comfortable as, e.g., the best published collision attacks at time of writing break Keccak only up to 6 rounds.”

A variant of “KangarooTwelve” called, “MarsupilamiFourteen” does 14 rounds instead of 12. The two extra rounds are justified as follows: “if one wishes to keep the same safety margin [after modifying other security parameters], an increase of the attack complexity must be compensated by adding rounds”.

Parallelization in KangarooTwelve

I want to solve one random question about KangarooTwelve: KangarooTwelve claims to be “parallelizable”. In a side project I need performant hash algorithms. So what does it mean in the KangarooTwelve context? Their paper says …

The other design choice that gives KangarooTwelve great speed for long messages is the use of a tree hash mode. […] Basically, the mode calls an underlying sponge-based compression function for each 8192-byte chunk of message and finally hashes the concatenation of the resulting digests. We call this the final node growing approach. Clearly, the chunks can be hashed in parallel.

The main advantage of the final node growing approach is that implementers can decide on the degree of parallelism their programs support. A simple implementation could compute everything serially, while another would process two, four or more branches in parallel using multiple cores, or more simply, a SIMD instruction set such as the Intel AVX2.

KangarooTwelve is not the only Keccak-based parallel hash mode. In late 2016, NIST published the SP 800-185 standard, including i parallelized hash mode called ParallelHash.

With respect to 256-bit SIMD, the authors claim the following:

On an Intel Core i5-6500 (Skylake), we measured that one evaluation of Keccak-p[1600, nr=12] takes about 450 cycles, while 2 in parallel about 730 cycles and 4×Keccak-p[1600,nr=12] about 770 cycles. This does not include the time needed to add the input bytes to the state. Yet, this clearly points out that the time per byte decreases with the degree of parallelism.

Proposal by JP Aumasson

AES: 9 rounds instead of 10 for AES-128, 10 instead of 12 for AES-192, 11 instead of 14 for AES-256, yielding respectively a 1.1×, 1.2×, and 1.3× speed-up
BLAKE2: 8 rounds instead of 12 for BLAKE2b, 7 rounds instead of 10 for BLAKE2s, yielding respectively a 1.5× and 1.4× speed-up
ChaCha: 8 rounds instead of 20 yielding a 2.5× speed-up
SHA-3: 10 rounds instead of 24 yielding a 2.4× speed-up

Be aware that e.g. SHA-3 has a dedicated web page putting speed in context with other hash algorithms.

Conclusion

Rarely have number of rounds been challenged as too high. A possible reason (simplifying) is that people competent to constructively question the number of rounds have no incentive to promote faster cryptography, and that those who don’t have the expertise to confidently suggest fewer rounds. Thus the status quo, which this article questions.

I like the message of the paper. In the academic world, the security level must be justified. Thus it is easy to pick a conservative parameter set (making reductionist proofs easier) and difficult to pick a more daring (and thus faster) parameter set. However, the effects to environment/economy/usability/… (due to more computation) are too little discussed. Balancing risk is difficult.