Motivation
Is post-quantum ready for deployment? Researchers developed a lot of cryptographic schemes that are currently under evaluation in the NIST PQCRYPTO competition. In general: performance, key sizes, and signature sizes is worse (i.e. higher/larger) compared to classical cryptography based on RSA or ECC. Thus, they are more cumbersome to deploy and might degrade user experience (where fast feedback loops are desirable). But what are the numbers?
We have two dimensions: time and space. We can compare the runtime in total or in individual steps. And we can consider the memory required on the stack (RAM), static section (RAM), or code (ROM). To get the most accurate data, we would have to get a state-of-the-art implementation with comparable security features, same optimization flags in the compiler running on the same device. This can be evaluated (the security features part might be the most difficult/debated part), but in this article try the best approximation. The Cortex M4 family is commonly used as a reference platform in cryptography and we fetch data from current papers.
Disclaimer: The data should give a rough idea. Use with caution!
Cryptographic schemes
-
AES (Advanced Encryption Standard) is the most famous symmetric cryptographic algorithm. It is widely deployed and your CPU might have separate instructions for it. It’s a primitive often used by PQCRYPTO algorithms.
-
RSA (Rivest-Shamir-Adleman) is the classical asymmetric algorithm. However, it was a long way from the original design to secure implementations we have today. RSA features an encryption scheme as well as signatures.
-
Ed25519 is a particular curve in elliptic curve cryptography (ECC) used in the EdDSA signature scheme. Among ECC implementations, it features a supposedly good runtime/security tradeoff.
-
BLISS (Bimodal Lattice Signature Scheme) is a post-quantum signature scheme.
-
NewHope and CRYSTALS-Kyber are post-quantum key encapsulation mechanisms and two of many lattice-based candidates in 2nd round of the NIST competition.
Data
Algorithm | Speed M3 (cycles) | Speed M4 (cycles) | ROM (bytes) | RAM (bytes) |
---|---|---|---|---|
AES-128 key expansion |
243.9 |
254.9 |
742 (code) + 1024 (data) |
176 (in/out) + 32 (stack) |
AES-128 single block encryption |
639.5 |
644.7 |
1970 (code) + 1024 (data) |
176+2m (in/out) + 44 (stack) |
AES-128 encryption/decryption in CTR mode |
531.8 |
537.9 |
2128 (code) + 1024 (data) |
192+2m (in/out) + 72 (stack) |
AES-192 key expansion |
232.9 |
240.2 |
682 (code) + 1024 (data) |
208 (in/out) + 32 (stack) |
AES-192 encryption/decryption in CTR mode |
651.0 |
656.0 |
2512 (code) + 1024 (data) |
224+2m (in/out) + 72 (stack) |
AES-256 key expansion |
315.8 |
319.9 |
958 (code) + 1024 (data) |
240 (in/out) + 32 (stack) |
AES-256 encryption/decryption in CTR mode |
767.9 |
774.6 |
2896 (code) + 1024 (data) |
256+2m (in/out) + 72 (stack) |
AES-128 key expansion to bitsliced state |
1027.8 |
1033.8 |
3434 (code) + 1036 (data) |
368 (in/out) + 188 (stack) |
Constant-time bitsliced AES-128 encryption/decryption in CTR mode |
1618.6 |
1619.6 |
11806 (code) + 12 (data) |
368+2m (in/out) + 108 (stack) |
Masked constant-time bitsliced AES-128 encryption/decryption in CTR mode |
N/A |
2132.51 (generating randomness) + 5291.6 (rest) |
39224 (code) + 12 (data) |
368+2m (in/out) + 1312 (storing randomness) + 276 (stack rest) |
Operation | Cycles (KeyGen) | Cycles (sign) | Cycles (verify) |
---|---|---|---|
BLISS |
367,859,092 |
5,927,441 |
1,002,299 |
RSA-1024 |
– |
30,627,432 |
1,573,079 |
RSA-2048 |
– |
228,068,226 |
6,195,481 |
ECC-192 |
7,400,421 |
7,720,020 |
14,716,374 |
ECC-224 |
9,849,334 |
10,414,487 |
19,558,528 |
ECC-256 |
12,713,277 |
13,102,239 |
24,702,099 |
[1] Schwabe, P., & Stoffelen, K. (2016, August). All the AES you need on Cortex-M3 and M4. In International Conference on Selected Areas in Cryptography (pp. 180-194). Springer, Cham. https://ko.stoffelen.nl/papers/sac2016-aesarm.pdf [2] Oder, T., Pöppelmann, T., & Güneysu, T. (2014, June). Beyond ECDSA and RSA: Lattice-based digital signatures on constrained devices. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC) (pp. 1-6). IEEE. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6881437 [4] Kannwischer, M. J., Rijneveld, J., Schwabe, P., & Stoffelen, K. (2019). pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4. https://repository.ubn.ru.nl/bitstream/handle/2066/210214/210214.pdf