Comparison of Kyber, NewHope, BLISS, RSA, ECC, and AES on Cortex M-4

Written on 2020-07-10 in 1100 words ✍️.
Part of cs IT-security pqcrypto

Motivation

Is post-quantum ready for deployment? Researchers developed a lot of cryptographic schemes that are currently under evaluation in the NIST PQCRYPTO competition. In general: performance, key sizes, and signature sizes is worse (i.e. higher/larger) compared to classical cryptography based on RSA or ECC. Thus, they are more cumbersome to deploy and might degrade user experience (where fast feedback loops are desirable). But what are the numbers?

We have two dimensions: time and space. We can compare the runtime in total or in individual steps. And we can consider the memory required on the stack (RAM), static section (RAM), or code (ROM). To get the most accurate data, we would have to get a state-of-the-art implementation with comparable security features, same optimization flags in the compiler running on the same device. This can be evaluated (the security features part might be the most difficult/debated part), but in this article try the best approximation. The Cortex M4 family is commonly used as a reference platform in cryptography and we fetch data from current papers.

Disclaimer: The data should give a rough idea. Use with caution!

Cryptographic schemes

Data

Algorithm Speed M3 (cycles) Speed M4 (cycles) ROM (bytes) RAM (bytes)

AES-128 key expansion

243.9

254.9

742 (code) + 1024 (data)

176 (in/out) + 32 (stack)

AES-128 single block encryption

639.5

644.7

1970 (code) + 1024 (data)

176+2m (in/out) + 44 (stack)

AES-128 encryption/decryption in CTR mode

531.8

537.9

2128 (code) + 1024 (data)

192+2m (in/out) + 72 (stack)

AES-192 key expansion

232.9

240.2

682 (code) + 1024 (data)

208 (in/out) + 32 (stack)

AES-192 encryption/decryption in CTR mode

651.0

656.0

2512 (code) + 1024 (data)

224+2m (in/out) + 72 (stack)

AES-256 key expansion

315.8

319.9

958 (code) + 1024 (data)

240 (in/out) + 32 (stack)

AES-256 encryption/decryption in CTR mode

767.9

774.6

2896 (code) + 1024 (data)

256+2m (in/out) + 72 (stack)

AES-128 key expansion to bitsliced state

1027.8

1033.8

3434 (code) + 1036 (data)

368 (in/out) + 188 (stack)

Constant-time bitsliced AES-128 encryption/decryption in CTR mode

1618.6

1619.6

11806 (code) + 12 (data)

368+2m (in/out) + 108 (stack)

Masked constant-time bitsliced AES-128 encryption/decryption in CTR mode

N/A

2132.51 (generating randomness) + 5291.6 (rest)

39224 (code) + 12 (data)

368+2m (in/out) + 1312 (storing randomness) + 276 (stack rest)

Operation Cycles (KeyGen) Cycles (sign) Cycles (verify)

BLISS

367,859,092

5,927,441

1,002,299

RSA-1024

30,627,432

1,573,079

RSA-2048

228,068,226

6,195,481

ECC-192

7,400,421

7,720,020

14,716,374

ECC-224

9,849,334

10,414,487

19,558,528

ECC-256

12,713,277

13,102,239

24,702,099

[1] Schwabe, P., & Stoffelen, K. (2016, August). All the AES you need on Cortex-M3 and M4. In International Conference on Selected Areas in Cryptography (pp. 180-194). Springer, Cham. https://ko.stoffelen.nl/papers/sac2016-aesarm.pdf [2] Oder, T., Pöppelmann, T., & Güneysu, T. (2014, June). Beyond ECDSA and RSA: Lattice-based digital signatures on constrained devices. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC) (pp. 1-6). IEEE. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6881437 [4] Kannwischer, M. J., Rijneveld, J., Schwabe, P., & Stoffelen, K. (2019). pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4. https://repository.ubn.ru.nl/bitstream/handle/2066/210214/210214.pdf