Computing parity using CRC

It is a party trick rather than anything practically useful, but the following four functions all compute the same thing:

uint8_t parity_b(const uint64_t* data, size_t n) {
    uint8_t acc = 0;
    for (size_t i = 0; i < n; ++i) {
        for (size_t j = 0; j < 64; ++j) {
            if (data[i] & (1ull << j)) acc ^= 1;
        }
    }
    return acc;
}

uint8_t parity_p(const uint64_t* data, size_t n) {
    uint8_t acc = 0;
    for (size_t i = 0; i < n; ++i) {
        acc += _mm_popcnt_u64(data[i]);
    }
    return acc & 1;
}

uint8_t parity_x(const uint64_t* data, size_t n) {
    uint64_t acc = 0;
    for (size_t i = 0; i < n; ++i) {
        acc ^= data[i];
    }
    return _mm_popcnt_u64(acc) & 1;
}

uint8_t parity_c(const uint64_t* data, size_t n) {
    uint32_t acc = 0;
    for (size_t i = 0; i < n; ++i) {
        acc = _mm_crc32_u64(acc, data[i]);
    }
    return _mm_popcnt_u64(acc) & 1;
}

Going from parity_b to parity_p is an easy step: replace the inner loop with a popcnt. Going from parity_p to parity_x is also an easy step: lift the popcnt out of the loop. Hence it is not too surprising that parity_b and parity_p and parity_x all compute the same thing. The parity_c function is like parity_x, except replacing ^= with crc, and the surprising thing is that parity_c computes the same thing as all the other functions. To understand why, we need to look at the polynomial used by the crc instruction (the CRC32-C polynomial):

$x^{32} + x^{28} + x^{27} + x^{26} + x^{25} + x^{23} + x^{22} + x^{20} + x^{19} + x^{18} + x^{14} + x^{13} + x^{11} + x^{10} + x^{9} + x^{8} + x^{6} + 1$

This polynomial can be expressed as the product of two separate polynomials:

$(x + 1)(x^{31} + x^{30} + x^{29} + x^{28} + x^{26} + x^{24} + x^{23} + x^{21} + x^{20} + x^{18} + x^{13} + x^{10} + x^{8} + x^{5} + x^{4} + x^{3} + x^{2} + x^{1} + 1)$

Accordingly, CRC32-C isn't really a 32-bit checksum, it is really two separate things packed into 32 bits: a 1-bit checksum and a 31-bit checksum. The method of packing is slightly funky, and thus the method of unpacking is also slightly funky: from a mathematical point of view, to get the 1-bit checksum out of the 32-bit value, we need to view the 32-bit value as a Z₂ polynomial and then compute the remainder modulo x + 1. This is equivalent to iteratively applying the reduction rule x = 1 until nothing more can be reduced. If we were to start with the 4-bit polynomial b₃ * x³ + b₂ * x² + b₁ * x + b₀ and iteratively apply x = 1, we'd get b₃ * 1*1*1 + b₂ * 1*1 + b₁ * 1 + b₀, which is just b₃ + b₂ + b₁ + b₀, which is just another way of saying popcnt (mod 2). The same thing is true starting with a 32-bit polynomial: the remainder modulo x + 1 is just popcnt (mod 2). In other words, we can use popcnt to unpack the 1-bit checksum out of the 32-bit value, and a 1-bit checksum is just parity. Hence why parity_c computes the same thing as all the other functions.

This combination of a 1-bit checksum and a 31-bit checksum can be seen as a good thing for checksumming purposes, as the two parts protect against different problems. Notably, every bit-flip in the data being checksummed will invert the 1-bit checksum, and hence any odd number of bit-flips will cause the 1-bit checksum to mismatch. The (ahem) flip-side is that the 1-bit checksum will match for any even number of bit-flips, which is where the 31-bit checksum comes in: it needs to protect against as many even number of bit-flips as possible.