Last time, we had a need to compute `mod_P(A * x32)` where `A` was a polynomial with degree at most 31, and `mod_P` took the remainder after dividing by a polynomial `P` with degree 32. The traditional Barrett reduction approach involved finding the quotient and then calculating the remainder from that:

``````Q = ((A * x32) * (x63 // P)) // x63
R =  (A * x32) - Q * P
return R``````

Via a combination of the `* x32`, and the degree of `P` being 32, and the context being Z2 polynomials, this simplified to:

``````Q = ((A * x32) * (x63 // P)) // x63
return (Q * P)[0:32]``````

The resultant C code (for bit-reversed polynomials) was very short:

``````uint32_t mul_x32_mod_P(uint32_t A) {
__m128i k = _mm_setr_epi32(0, 0xdea713f1, 0x05ec76f1, 1);
__m128i a = _mm_cvtsi32_si128(A);
__m128i b = _mm_clmulepi64_si128(a, k, 0x00);
__m128i c = _mm_clmulepi64_si128(b, k, 0x10);
return _mm_extract_epi32(c, 2);
}
``````

Though our particular case permitted this simplification, it is not permitted in general. In Faster Remainder by Direct Computation (D. Lemire, O. Kaser, and N. Kurz, 2018), a different approach was presented: the remainder can be computed directly, rather than computing the quotient and getting the remainder from that. Said paper is about integers, and its main Theorem 1 is effectively:

Provided that `d in [1, 2N)` and `n in [0, 2N)` and `F ≥ N + log2(d)`, then:
`n % d` equals `(((n * ceil(2F / d)) % 2F) * d) // 2F`.

Translated from the language of integers to the language of polynomials, this theorem would be:

Provided that `S`, `P`, `T` are polynomials satisfying `deg((S * P) // T) ≤ 0`, then:
`S % P` equals `(((S * (T // P)) % T) * P) // T`.

This lets us turn `% P` into multiplication by `T // P` followed by `% T` followed by `* P` followed by `// T`. The `% T` and `// T` steps are trivial if `T` is chosen to be `xn` (with `n ≥ deg(S) + deg(P)` to make the precondition hold), and `T // P` can be precomputed if `P` is fixed.

It turns out that the translated theorem is true, and that the proof isn't too hard. The proof builds upon the Barrett reduction property, which is:

Provided that `deg(S // T) ≤ 0`, `S // P` equals `(S * (T // P)) // T`.

We start with the complicated side of our (supposed) equality:

``````(((S * (T // P)) % T) * P) // T
``````

We're going to be overwhelmed by parentheses if we're not careful, so I'm going to say that a sequence of mixed `*` and `%` and `//` operations always happens left-to-right, thus allowing some of the parentheses to be dropped:

``````S * (T // P) % T * P // T
``````

From here, the `%` can be expanded: (`F % G` equals `F - (F // G * G)`)

``````((S * (T // P)) - (S * (T // P) // T * T)) * P // T
``````

Then push the `-` to the top level:

``````(S * (T // P) * P // T) - (S * (T // P) // T * T * P // T)
``````

Making use of our `deg((S * P) // T) ≤ 0` precondition, by the Barrett reduction property, `(S * P) // P` equals `(S * P) * (T // P) // T`:

``````(S * P // P) - (S * (T // P) // T * T * P // T)
``````

We can then simplify away `* G // G` for any `G`:

``````S - (S * (T // P) // T * P)
``````

Our `deg((S * P) // T) ≤ 0` precondition implies `deg(S // T) ≤ 0`, so by the Barrett reduction property, `S // P` equals `S * (T // P) // T`:

``````S - (S // P * P)
``````

Which is the definition of:

``````S % P
``````

Thus our supposed equality is an actual equality.