Division by constant in assembly

Motivation

CRYSTALS-Kyber is one of the schemes in the third round of the NIST post-quantum standardization project. In particular, the (reference implementation as well as) optimized implementation uses a divison by KYBER_Q (= 3329) in the decoding step. Our claim is that division by a constant is often represented as multiplication (sometimes called “division by invariant integers using multiplication”). But on which platforms and optimization levels?

Code snippet

The following code snippet can be inserted on godbolt to reproduce the data:

#include <stdint.h>
int square(uint16_t num) { return num / 3329; }

godbolt results

Table 1. Analysis which instruction is used for representing division on godbolt with different setups
Platform	-O3	Result
x86-64 gcc 10.2	no	`imul`, no div
x86-64 gcc 10.2	yes	`imul`, no div
x86-64 gcc 8.3	no	`imul`, no div
x86-64 gcc 8.3	yes	`imul`, no div
x86-64 clang 11.0.0	no	`idiv`
x86-64 clang 11.0.0	yes	`imul`
ARM gcc 9.2.1	no	`umull`, no div
ARM gcc 9.2.1	yes	`umull`, no div
ARM64 gcc 8.2	no	`umull`, no div
ARM64 gcc 8.2	yes	`umull`, no div
armv8-a clang 11.0.0	no	`sdiv`
armv8-a clang 11.0.0	yes	`umull`
RISC-V rv32gc clang 10.0.0	no	`mulhu`
RISC-V rv32gc clang 10.0.0	yes	`mulhu`
WebAssembly clang (trunk)	no	`i32.div_s`
WebAssembly clang (trunk)	yes	`i32.div_u`

Result

I think the overall conclusion is roughly: This is compiler-specific and less platform-specific. gcc always uses multiplication. LLVM prefers to use division unless it is an optimized build.

Credits

The idea and background on this topic was contributed by my co-author, Peter. Thank you!