Motivation
It is very difficult to predict when multiplication will overflow. As such in memory-unsafe languages, people tend to use the multiplication algorithm without appropriate checks. As a result, rust decided to provide dedicated multiplication methods (e.g. of the primitive data type i32):
-
fn wrapping_mul(self, rhs: i32) → i32
computes the product of both arguments modulo 232 -
fn saturating_mul(self, rhs: i32) → i32
computes the product of both arguments ori32::MAX
/i32::MIN
in case of an overflow/underflow respectively
There are also additional functions which give you more information about overflows:
-
fn overflowing_mul(self, rhs: i32) → (i32, bool)
returns the result ofwrapping_mul
and a boolean value indicating whether an overflow occured -
fn checked_mul(self, rhs: i32) → Option<i32>
returns the product iff no overflow/underflow occurs -
fn unchecked_mul(self, rhs: i32) → i32
computes the product and triggers undefined behavior if an overflow/underflow occurs
Do these functions generate branches? Branches might be of interest for cryptographic applications (constant-time algorithms) and performance (branch freedom). Intuitively, they are going to. They will compute the product of two 32 bits integers as a 64 integer and check whether the high 32 bits of the result are non-zero. But let us evaluate it.
The methodology
We use godbolt and look at the following example, where op
is replaced by one of the two functions above:
pub fn square(num: i32) -> i32 {
num.op(num)
}
Then we look at the assembly and identify jump instructions and/or labels to determine whether a branch was generated. Specifically, we are interested in the following four architectures:
-
x86_64-unknown-linux-gnu
-
armv7a-none-eabi
-
riscv32imac-unknown-none-elf
-
wasm32-unknown-unknown
I used a command like rustc test.rs --emit asm --crate-type lib --target=x86_64-unknown-linux-gnu
. WebAssembly requires the #![no_std]
annotation in the source file. Additionally, we look at corresponding addition operations.
Results
Creates a branch? | wrapping_mul |
saturating_mul |
wrapping_add |
saturating_add |
---|---|---|---|---|
|
no; |
yes; |
no; |
no; |
|
no; |
yes; |
no; |
no; |
|
no; |
yes; |
no; |
yes; |
|
no; |
yes; |
no; |
no; |
In the case of branched saturating methods, the checked method can be found explicitly in the executable.
A deeper look at saturating_add at x86_64-unknown-linux-gnu
.text
.file "test.6a688027-cgu.0"
.section .text._ZN4test6square17h5fb345a26144c432E,"ax",@progbits
.globl _ZN4test6square17h5fb345a26144c432E
.p2align 4, 0x90
.type _ZN4test6square17h5fb345a26144c432E,@function _ZN4test6square17h5fb345a26144c432E:
.cfi_startproc
pushq %rax
.cfi_def_cfa_offset 16
movl %edi, %eax
movl %eax, %edi
addl %edi, %edi
testl %edi, %edi
setns %cl
movzbl %cl, %ecx
addl $2147483647, %ecx
addl %eax, %eax
cmovol %ecx, %eax
movl %eax, 4(%rsp)
movl 4(%rsp), %eax
movl %eax, (%rsp)
movl (%rsp), %eax
popq %rcx
.cfi_def_cfa_offset 8
retq
.Lfunc_end0:
.size _ZN4test6square17h5fb345a26144c432E, .Lfunc_end0-_ZN4test6square17h5fb345a26144c432E
.cfi_endproc
.section ".note.GNU-stack","",@progbit
Apparently, EDX is added with EDX and then an AND is computed between the arguments (testl
). setns %cl
will set the lowest byte of register ECX if SF=0
(thus the sign flag was set after AND). Then a zero-extended byte is converted into a long and a MOV is applied (movzbl
). Then this value is added to 231 - 1. After another add of EAX, cmovol
is executed, which applies a conditional MOV if an overflow occured.
At this point, we can already identify the idea how to circumvent a branched instruction: we look at the highest bit of the argument (remember that function square
above has only one argument). If it is set, we will get an overflow and will overwrite the result with the high value. Otherwise, we will result the non-overflowing sum.
Conclusion
I expected the intuitive result to be true for every platform. But apparently, x86_64, ARM and WASM can handle saturating_add
without branches. Very surprising! One can argue that the branch/condition is hidden inside the instructions, but this was not the original question. I care about branches on instruction level.