Deriving algorithms for computing modulo constant n

Written on 2021-06-15 in 2513 words ✍️.
Part of cs IT-security pqcrypto software-development

Motivation

It is trivial to compute the remainder with respect to power of 2 on a computer. But what about different bases?

Notational remarks

\(i_b\) denotes integer i written in base b. For example, \(20_{10} = 10100_2\).
Modulo operation:
- The modulo operation returns the remainder. A division divides dividend by divisor and yields quotient and remainder.
- In computer science (and discrete mathematics), it is common to write \(X \bmod{Q}\).
- In programming, we write X % Q.
- And in algebra, we write X ∈ ℤ_Q. For preliminaries, please read up on modular arithmetics. In the following, I will use the notation from algebra.

Computational remarks

In order to find the remainder of the division X ÷ Q, you can apply the following algorithm:

Is X ∈ [0, …, Q-1]? If so, goto 4.
If X < 0, let X be X + Q. Else let X be X - Q.
goto 1.
X is the remainder.

However, this algorithm takes long. There are many addition/subtraction operations. We want to be faster.

For the experts, I am discussing the Euclidean definition of the modulo operation here.

In ℤ_2ᴺ

Claim: Modular reduction in ℤ_2ᴺ works by taking the N least significant bits.
Modular reductions in ℤ_2ᴺ are trivial on a modern computer. Since modern computer use binary representations, we actually have some advantage for base 2.
Example: Let us consider some numbers in ℤ in decimal and binary. With N=3, we also consider ℤ₈:

\[ \begin{array}{rlcccc} 20_{10} &= 10100_2 & \text{ in } \mathbb{Z} & = & 100_2 & \text{ in } \mathbb{Z}_8 \\ 16_{10} &= 10000_2 & & = & 000_2 & \text{ in } \mathbb{Z}_8 \\ 15_{10} &= 1111_2 & & = & 111_2 & \text{ in } \mathbb{Z}_8 \\ 4_{10} &= 100_2 & & = & 100_2 & \text{ in } \mathbb{Z}_8 \\ 3_{10} &= 11_2 & & = & 011_2 & \text{ in } \mathbb{Z}_8 \\ 2_{10} &= 10_2 & & = & 010_2 & \text{ in } \mathbb{Z}_8 \end{array} \]
Why?

\[ \begin{align} 10100_2 &= 1 \cdot 2^4 + 0 \cdot 2^3 \hspace{24pt} + 1 \cdot 2^2 + 0 \cdot 2^1 + 0 \cdot 2^0 \\ &= (1 \cdot 2^1 + 0 \cdot 2^0) \cdot 2^3 + 1 \cdot 2^2 + 0 \cdot 2^1 + 0 \cdot 2^0 \end{align} \]

Apparently, the first parenthesis only contains multiples of \(2^3\) and can thus be discarded in ℤ₈ (since \(2^3 = 0\)). We’re left with \(1 \cdot 2^2 + 0 \cdot 2^1 + 0 \cdot 2^0 = 100_2\).
Conclusion: True, because higher bits are a multiple of 2ᴺ

In ℤ_2ᴺ-1

Claim: We split X into chunks of N positions and add up all chunks.
Example: Let us consider \(20_{10} = 10100_2\) in ℤ₇ (thus N=3). Interestingly, we do not discard the upper bits, but add them:

\[ 20_{10} = 10100_2 = 10_2 + 100_2 = 110_2 \]

Our testcase is actually true: \(20_{10} = 6_{10}\) in ℤ₇.
Why? In order to show this claim, we consider X and divisor 7 in binary expansion:

\[ \begin{align} X &=: d_6 \cdot 2^6 + d_5 \cdot 2^7 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ 7_{10} &= 1 \cdot 2^2 + 1 \cdot 2^1 + 1 \cdot 2^0 \end{align} \]

We apply polynomial division:

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^2 + 2^1 + 2^0 \\ &= d_6 \cdot 2^4 + (d_5 - d_6) \cdot 2^3 + (d_4 - d_5) \cdot 2^2 + (d_3 - d_4 + d_6) \cdot 2^1 + (d_2 - d_3 + d_5 - d_6) \\ &\text{with remainder } (d_1 - d_2 + d_4 - d_5) \cdot 2^1 + (d_0 - d_2 + d_3 - d_5 + d_6) \cdot 2^0 \end{align} \]

(I am skipping the computation here, since it is lengthy and culturally specific how to write it)
Consider these transformations for the remainder:

\[ \begin{align} &(d_1 - d_2 + d_4 - d_5) \cdot 2^1 + (d_0 - d_2 + d_3 - d_5 + d_6) \cdot 2^0 \text{ in } \mathbb{Z}_7 \\ &= (d_1 + d_4) \cdot 2^1 + (d_0 + d_3 + d_6) \cdot 2^0 + (-1) \cdot (d_2 + d_5) \cdot 2^1 + (-1) \cdot (d_2 + d_5) \cdot 2^0 \\ &= (d_1 + d_4) \cdot 2^1 + (d_0 + d_3 + d_6) \cdot 2^0 + (5) \cdot (d_2 + d_5) \cdot 2^1 + (6) \cdot (d_2 + d_5) \cdot 2^0 \\ &= (d_1 + d_4) \cdot 2^1 + (d_0 + d_3 + d_6) \cdot 2^0 + (2^2 + 2^0) \cdot (d_2 + d_5) \cdot 2^1 + (2^2 + 2^1) \cdot (d_2 + d_5) \cdot 2^0 \\ &= (d_2 + d_5) \cdot 2^2 + (d_1 + d_4) \cdot 2^1 + (d_0 + d_3 + d_6) \cdot 2^0 \end{align} \]

Recognize that \(-1 = 6\) and \(-2 = 5\) in ℤ₇. Now, if we denote the digits of a number in one row and add up the rows, we can see our claim:

\[ \begin{array}{ccc} d_2 & d_1 & d_0 \\ d_5 & d_4 & d_3 \\ & & d_6 \\ \end{array} \]

We can see one chunk per row and if we add up chunks, we get the result.
Conclusion: Chunking and adding is true, because the remainder has a representation which corresponds to this process.

Looking at data

We can actually do this process for any constant divisor. As a result, we get a faster algorithm we can easily implement in assembly. Be aware that successive addition might yield overflows. This means you might have to apply the algorithm several times to actually get the correct remainder.

I was neither able to automate this in sagemath nor found it in Don Knuth’s “The Art Of Computer Programming”. So here is my manually evaluated data:

In ℤ₃

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^1 + 2^0 \\ &= d_6 \cdot 2^5 + (d_5 - d_6) \cdot 2^4 + (d_4 - d_5 + d_6) \cdot 2^3 + (d_3 - d_4 + d_5 - d_6) \cdot 2^2 \\ &\hspace{10pt} + (d_2 - d_3 + d_4 - d_5 + d_6) \cdot 2^1 + (d_1 - d_2 + d_3 - d_4 + d_5 - d_6) \cdot 2^0 \\ & \text{remainder } (d_0 - d_1 + d_2 - d_3 + d_4 - d_5 + d_6) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_1 & d_0 \\ d_3 & d_2 \\ d_5 & d_4 \\ & d_6 \\ \end{array} \]

In ℤ₅

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^2 + 2^0 \\ &= d_6 \cdot 2^4 + d_5 \cdot 2^3 + (d_4 - d_6) \cdot 2^2 + (d_3 - d_5) \cdot 2^1 + (d_2 - d_4 + d_6) \cdot 2^0 \\ & \text{remainder } (d_1 - d_3 + d_5) \cdot 2^1 + (d_0 - d_2 + d_4 - d_6) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_2 & d_1 & d_0 \\ d_6 & d_3 & d_3 \\ & d_5 & d_4 \end{array} \]

In ℤ₆

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^2 + 2^1 \\ &= d_6 \cdot 2^4 + (d_5 - d_6) \cdot 2^3 + (d_4 - d_5 + d_6) \cdot 2^2 + (d_3 - d_4 + d_5 - d_6) \cdot 2^1 \\ &\hspace{10pt} + (d_2 - d_3 + d_4 - d_5 + d_6) \cdot 2^0 \\ & \text{remainder } (d_1 - d_2 + d_3 - d_4 + d_5 - d_6) \cdot 2^1 + (d_0) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_2 & d_1 & d_0 \\ d_4 & d_3 & \\ d_6 & d_5 & \end{array} \]

In ℤ₉

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^3 + 2^0 \\ &= d_6 \cdot 2^3 + d_5 \cdot 2^2 + d_4 \cdot 2^1 + (d_3 - d_6) \cdot 2^0 \\ & \text{remainder } (d_2 - d_5) \cdot 2^2 + (d_1 - d_4) \cdot 2^1 + (d_0 - d_3 + d_6) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_3 & d_2 & d_1 & d_0 \\ & d_4 & d_4 & d_4 \\ & d_5 & & d_5 \\ & & & d_6 \end{array} \]

In ℤ₁₀

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^3 + 2^1 \\ &= d_6 \cdot 2^3 + d_5 \cdot 2^2 + (d_4 - d_6) \cdot 2^1 + (d_3 - d_5) \cdot 2^0 \\ & \text{remainder } (d_2 - d_4 + d_6) \cdot 2^2 + (d_1 - d_3 + d_5) \cdot 2^1 + d_0 \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_3 & d_2 & d_1 & d_0 \\ & d_4 & d_4 & \\ & d_6 & d_5 & \end{array} \]

In ℤ₁₁

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^3 + 2^1 + 2^0 \\ &= d_6 \cdot 2^3 + d_5 \cdot 2^2 + (d_4 - d_6) \cdot 2^1 + (d_3 - d_5 - d_6) \cdot 2^0 \\ & \text{remainder } (d_2 - d_4 - d_5 + d_6) \cdot 2^2 + (d_1 - d_3 - d_4 + d_5 + 2 \cdot d_6) \cdot 2^1 \\ &\hspace{34pt} + (d_0 - d_3 + d_5 + d_6) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_3 & d_2 & d_1 & d_0 \\ d_5 & d_4 & d_5 & d_4 \\ d_6 & & & d_6 \end{array} \]

In ℤ₁₂

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^3 + 2^2 \\ &= d_6 \cdot 2^3 + (d_5 - d_6) \cdot 2^2 + (d_4 - d_5 + d_6) \cdot 2^1 + (d_3 - d_4 + d_5 - d_6) \cdot 2^0 \\ & \text{remainder } (d_2 - d_3 + d_4 - d_5 + d_6) \cdot 2^2 + (d_1) \cdot 2^1 + (d_0) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_3 & d_2 & d_1 & d_0 \\ d_5 & d_4 & & \\ & d_6 & & \end{array} \]

In ℤ₁₃

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^3 + 2^2 + 2^0 \\ &= d_6 \cdot 2^3 + (d_5 - d_6) \cdot 2^2 + (d_4 - d_5 + d_6) \cdot 2^1 + (d_3 - d_4 + d_5 - 2 \cdot d_6) \cdot 2^0 \\ & \text{remainder } (d_2 - d_3 + d_4 - 2 \cdot d_5 + 3 \cdot d_6) \cdot 2^2 + (d_1 - d_4 + d_5 - d_6) \cdot 2^1 \\ &\hspace{34pt} + (d_0 - d_3 + d_4 - d_5 + 2 \cdot d_6) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_3 & d_2 & d_1 & d_0 \\ d_6 & d_5 & d_4 & d_4 \\ & d_6 & d_5 & \end{array} \]

In ℤ₁₄

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^3 + 2^2 + 2^1 \\ &= d_6 \cdot 2^3 + (d_5 - d_6) \cdot 2^2 + (d_4 - d_5) \cdot 2^1 + (d_3 - d_4 + d_6) \cdot 2^0 \\ & \text{remainder } (d_2 - d_3 + d_5 - d_6) \cdot 2^2 + (d_1 - d_3 + d_5 - d_6) \cdot 2^1 + (d_0) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_3 & d_2 & d_1 & d_0 \\ d_6 & d_5 & d_4 & \end{array} \]

Conclusion

We can derive an algorithm using shifts and addition to compute the remainder by any constant.
This algorithm is faster than a trivial loop with subsequent additions/subtractions.
For many constants, the compiler knows even faster algorithms. You can look at the assembly output with godbolt (e.g. ℤ₂₄).
Another interesting reference: Stackoverflow: “Efficient (cycles wise) algorithm to compute modulo 25?”

Deriving algorithms for computing modulo constant n

Motivation

Notational remarks

Computational remarks

In ℤ2ᴺ

In ℤ2ᴺ-1

Looking at data

In ℤ3

In ℤ5

In ℤ6

In ℤ9

In ℤ10

In ℤ11

In ℤ12

In ℤ13

In ℤ14

Conclusion

In ℤ_2ᴺ

In ℤ_2ᴺ-1

In ℤ₃

In ℤ₅

In ℤ₆

In ℤ₉

In ℤ₁₀

In ℤ₁₁

In ℤ₁₂

In ℤ₁₃

In ℤ₁₄