Deriving algorithms for computing modulo constant n

✍️ Written on 2021-06-15 in 2513 words. Part of cs IT-security pqcrypto software-development

Motivation

It is trivial to compute the remainder with respect to power of 2 on a computer. But what about different bases?

Notational remarks

  • \(i_b\) denotes integer i written in base b. For example, \(20_{10} = 10100_2\).

  • Modulo operation:

    • The modulo operation returns the remainder. A division divides dividend by divisor and yields quotient and remainder.

    • In computer science (and discrete mathematics), it is common to write \(X \bmod{Q}\).

    • In programming, we write X % Q.

    • And in algebra, we write X ∈ ℤQ. For preliminaries, please read up on modular arithmetics. In the following, I will use the notation from algebra.

Computational remarks

In order to find the remainder of the division X ÷ Q, you can apply the following algorithm:

  1. Is X ∈ [0, …, Q-1]? If so, goto 4.

  2. If X < 0, let X be X + Q. Else let X be X - Q.

  3. goto 1.

  4. X is the remainder.

However, this algorithm takes long. There are many addition/subtraction operations. We want to be faster.

For the experts, I am discussing the Euclidean definition of the modulo operation here.

In ℤ2ᴺ

  • Claim: Modular reduction in ℤ2ᴺ works by taking the N least significant bits.

  • Modular reductions in ℤ2ᴺ are trivial on a modern computer. Since modern computer use binary representations, we actually have some advantage for base 2.

  • Example: Let us consider some numbers in ℤ in decimal and binary. With N=3, we also consider ℤ₈:

    \[ \begin{array}{rlcccc} 20_{10} &= 10100_2 & \text{ in } \mathbb{Z} & = & 100_2 & \text{ in } \mathbb{Z}_8 \\ 16_{10} &= 10000_2 & & = & 000_2 & \text{ in } \mathbb{Z}_8 \\ 15_{10} &= 1111_2 & & = & 111_2 & \text{ in } \mathbb{Z}_8 \\ 4_{10} &= 100_2 & & = & 100_2 & \text{ in } \mathbb{Z}_8 \\ 3_{10} &= 11_2 & & = & 011_2 & \text{ in } \mathbb{Z}_8 \\ 2_{10} &= 10_2 & & = & 010_2 & \text{ in } \mathbb{Z}_8 \end{array} \]
  • Why?

    \[ \begin{align} 10100_2 &= 1 \cdot 2^4 + 0 \cdot 2^3 \hspace{24pt} + 1 \cdot 2^2 + 0 \cdot 2^1 + 0 \cdot 2^0 \\ &= (1 \cdot 2^1 + 0 \cdot 2^0) \cdot 2^3 + 1 \cdot 2^2 + 0 \cdot 2^1 + 0 \cdot 2^0 \end{align} \]

    Apparently, the first parenthesis only contains multiples of \(2^3\) and can thus be discarded in ℤ₈ (since \(2^3 = 0\)). We’re left with \(1 \cdot 2^2 + 0 \cdot 2^1 + 0 \cdot 2^0 = 100_2\).

  • Conclusion: True, because higher bits are a multiple of 2ᴺ

In ℤ2ᴺ-1

  • Claim: We split X into chunks of N positions and add up all chunks.

  • Example: Let us consider \(20_{10} = 10100_2\) in ℤ₇ (thus N=3). Interestingly, we do not discard the upper bits, but add them:

    \[ 20_{10} = 10100_2 = 10_2 + 100_2 = 110_2 \]

    Our testcase is actually true: \(20_{10} = 6_{10}\) in ℤ₇.

  • Why? In order to show this claim, we consider X and divisor 7 in binary expansion:

    \[ \begin{align} X &=: d_6 \cdot 2^6 + d_5 \cdot 2^7 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ 7_{10} &= 1 \cdot 2^2 + 1 \cdot 2^1 + 1 \cdot 2^0 \end{align} \]
    \[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^2 + 2^1 + 2^0 \\ &= d_6 \cdot 2^4 + (d_5 - d_6) \cdot 2^3 + (d_4 - d_5) \cdot 2^2 + (d_3 - d_4 + d_6) \cdot 2^1 + (d_2 - d_3 + d_5 - d_6) \\ &\text{with remainder } (d_1 - d_2 + d_4 - d_5) \cdot 2^1 + (d_0 - d_2 + d_3 - d_5 + d_6) \cdot 2^0 \end{align} \]

    (I am skipping the computation here, since it is lengthy and culturally specific how to write it)
    Consider these transformations for the remainder:

    \[ \begin{align} &(d_1 - d_2 + d_4 - d_5) \cdot 2^1 + (d_0 - d_2 + d_3 - d_5 + d_6) \cdot 2^0 \text{ in } \mathbb{Z}_7 \\ &= (d_1 + d_4) \cdot 2^1 + (d_0 + d_3 + d_6) \cdot 2^0 + (-1) \cdot (d_2 + d_5) \cdot 2^1 + (-1) \cdot (d_2 + d_5) \cdot 2^0 \\ &= (d_1 + d_4) \cdot 2^1 + (d_0 + d_3 + d_6) \cdot 2^0 + (5) \cdot (d_2 + d_5) \cdot 2^1 + (6) \cdot (d_2 + d_5) \cdot 2^0 \\ &= (d_1 + d_4) \cdot 2^1 + (d_0 + d_3 + d_6) \cdot 2^0 + (2^2 + 2^0) \cdot (d_2 + d_5) \cdot 2^1 + (2^2 + 2^1) \cdot (d_2 + d_5) \cdot 2^0 \\ &= (d_2 + d_5) \cdot 2^2 + (d_1 + d_4) \cdot 2^1 + (d_0 + d_3 + d_6) \cdot 2^0 \end{align} \]

    Recognize that \(-1 = 6\) and \(-2 = 5\) in ℤ₇. Now, if we denote the digits of a number in one row and add up the rows, we can see our claim:

    \[ \begin{array}{ccc} d_2 & d_1 & d_0 \\ d_5 & d_4 & d_3 \\ & & d_6 \\ \end{array} \]

    We can see one chunk per row and if we add up chunks, we get the result.

  • Conclusion: Chunking and adding is true, because the remainder has a representation which corresponds to this process.

Looking at data

We can actually do this process for any constant divisor. As a result, we get a faster algorithm we can easily implement in assembly. Be aware that successive addition might yield overflows. This means you might have to apply the algorithm several times to actually get the correct remainder.

I was neither able to automate this in sagemath nor found it in Don Knuth’s “The Art Of Computer Programming”. So here is my manually evaluated data:

In ℤ3

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^1 + 2^0 \\ &= d_6 \cdot 2^5 + (d_5 - d_6) \cdot 2^4 + (d_4 - d_5 + d_6) \cdot 2^3 + (d_3 - d_4 + d_5 - d_6) \cdot 2^2 \\ &\hspace{10pt} + (d_2 - d_3 + d_4 - d_5 + d_6) \cdot 2^1 + (d_1 - d_2 + d_3 - d_4 + d_5 - d_6) \cdot 2^0 \\ & \text{remainder } (d_0 - d_1 + d_2 - d_3 + d_4 - d_5 + d_6) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_1 & d_0 \\ d_3 & d_2 \\ d_5 & d_4 \\ & d_6 \\ \end{array} \]

In ℤ5

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^2 + 2^0 \\ &= d_6 \cdot 2^4 + d_5 \cdot 2^3 + (d_4 - d_6) \cdot 2^2 + (d_3 - d_5) \cdot 2^1 + (d_2 - d_4 + d_6) \cdot 2^0 \\ & \text{remainder } (d_1 - d_3 + d_5) \cdot 2^1 + (d_0 - d_2 + d_4 - d_6) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_2 & d_1 & d_0 \\ d_6 & d_3 & d_3 \\ & d_5 & d_4 \end{array} \]

In ℤ6

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^2 + 2^1 \\ &= d_6 \cdot 2^4 + (d_5 - d_6) \cdot 2^3 + (d_4 - d_5 + d_6) \cdot 2^2 + (d_3 - d_4 + d_5 - d_6) \cdot 2^1 \\ &\hspace{10pt} + (d_2 - d_3 + d_4 - d_5 + d_6) \cdot 2^0 \\ & \text{remainder } (d_1 - d_2 + d_3 - d_4 + d_5 - d_6) \cdot 2^1 + (d_0) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_2 & d_1 & d_0 \\ d_4 & d_3 & \\ d_6 & d_5 & \end{array} \]

In ℤ9

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^3 + 2^0 \\ &= d_6 \cdot 2^3 + d_5 \cdot 2^2 + d_4 \cdot 2^1 + (d_3 - d_6) \cdot 2^0 \\ & \text{remainder } (d_2 - d_5) \cdot 2^2 + (d_1 - d_4) \cdot 2^1 + (d_0 - d_3 + d_6) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_3 & d_2 & d_1 & d_0 \\ & d_4 & d_4 & d_4 \\ & d_5 & & d_5 \\ & & & d_6 \end{array} \]

In ℤ10

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^3 + 2^1 \\ &= d_6 \cdot 2^3 + d_5 \cdot 2^2 + (d_4 - d_6) \cdot 2^1 + (d_3 - d_5) \cdot 2^0 \\ & \text{remainder } (d_2 - d_4 + d_6) \cdot 2^2 + (d_1 - d_3 + d_5) \cdot 2^1 + d_0 \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_3 & d_2 & d_1 & d_0 \\ & d_4 & d_4 & \\ & d_6 & d_5 & \end{array} \]

In ℤ11

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^3 + 2^1 + 2^0 \\ &= d_6 \cdot 2^3 + d_5 \cdot 2^2 + (d_4 - d_6) \cdot 2^1 + (d_3 - d_5 - d_6) \cdot 2^0 \\ & \text{remainder } (d_2 - d_4 - d_5 + d_6) \cdot 2^2 + (d_1 - d_3 - d_4 + d_5 + 2 \cdot d_6) \cdot 2^1 \\ &\hspace{34pt} + (d_0 - d_3 + d_5 + d_6) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_3 & d_2 & d_1 & d_0 \\ d_5 & d_4 & d_5 & d_4 \\ d_6 & & & d_6 \end{array} \]

In ℤ12

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^3 + 2^2 \\ &= d_6 \cdot 2^3 + (d_5 - d_6) \cdot 2^2 + (d_4 - d_5 + d_6) \cdot 2^1 + (d_3 - d_4 + d_5 - d_6) \cdot 2^0 \\ & \text{remainder } (d_2 - d_3 + d_4 - d_5 + d_6) \cdot 2^2 + (d_1) \cdot 2^1 + (d_0) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_3 & d_2 & d_1 & d_0 \\ d_5 & d_4 & & \\ & d_6 & & \end{array} \]

In ℤ13

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^3 + 2^2 + 2^0 \\ &= d_6 \cdot 2^3 + (d_5 - d_6) \cdot 2^2 + (d_4 - d_5 + d_6) \cdot 2^1 + (d_3 - d_4 + d_5 - 2 \cdot d_6) \cdot 2^0 \\ & \text{remainder } (d_2 - d_3 + d_4 - 2 \cdot d_5 + 3 \cdot d_6) \cdot 2^2 + (d_1 - d_4 + d_5 - d_6) \cdot 2^1 \\ &\hspace{34pt} + (d_0 - d_3 + d_4 - d_5 + 2 \cdot d_6) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_3 & d_2 & d_1 & d_0 \\ d_6 & d_5 & d_4 & d_4 \\ & d_6 & d_5 & \end{array} \]

In ℤ14

\[ \begin{align} &d_6 \cdot 2^6 + d_5 \cdot 2^5 + d_4 \cdot 2^4 + d_3 \cdot 2^3 + d_2 \cdot 2^2 + d_1 \cdot 2^1 + d_0 \cdot 2^0 \\ &÷ ~ 2^3 + 2^2 + 2^1 \\ &= d_6 \cdot 2^3 + (d_5 - d_6) \cdot 2^2 + (d_4 - d_5) \cdot 2^1 + (d_3 - d_4 + d_6) \cdot 2^0 \\ & \text{remainder } (d_2 - d_3 + d_5 - d_6) \cdot 2^2 + (d_1 - d_3 + d_5 - d_6) \cdot 2^1 + (d_0) \cdot 2^0 \end{align} \]

gives

\[ \begin{array}{ccc} d_3 & d_2 & d_1 & d_0 \\ d_6 & d_5 & d_4 & \end{array} \]

Conclusion

  • We can derive an algorithm using shifts and addition to compute the remainder by any constant.

  • This algorithm is faster than a trivial loop with subsequent additions/subtractions.

  • For many constants, the compiler knows even faster algorithms. You can look at the assembly output with godbolt (e.g. 24).

  • Another interesting reference: Stackoverflow: “Efficient (cycles wise) algorithm to compute modulo 25?”