It is a party trick rather than anything practically useful, but the following four functions all compute the same thing:

``````uint8_t parity_b(const uint64_t* data, size_t n) {
uint8_t acc = 0;
for (size_t i = 0; i < n; ++i) {
for (size_t j = 0; j < 64; ++j) {
if (data[i] & (1ull << j)) acc ^= 1;
}
}
return acc;
}
``````
``````uint8_t parity_p(const uint64_t* data, size_t n) {
uint8_t acc = 0;
for (size_t i = 0; i < n; ++i) {
acc += _mm_popcnt_u64(data[i]);
}
return acc & 1;
}
``````
``````uint8_t parity_x(const uint64_t* data, size_t n) {
uint64_t acc = 0;
for (size_t i = 0; i < n; ++i) {
acc ^= data[i];
}
return _mm_popcnt_u64(acc) & 1;
}
``````
``````uint8_t parity_c(const uint64_t* data, size_t n) {
uint32_t acc = 0;
for (size_t i = 0; i < n; ++i) {
acc = _mm_crc32_u64(acc, data[i]);
}
return _mm_popcnt_u64(acc) & 1;
}
``````

Going from `parity_b` to `parity_p` is an easy step: replace the inner loop with a `popcnt`. Going from `parity_p` to `parity_x` is also an easy step: lift the `popcnt` out of the loop. Hence it is not too surprising that `parity_b` and `parity_p` and `parity_x` all compute the same thing. The `parity_c` function is like `parity_x`, except replacing `^=` with `crc`, and the surprising thing is that `parity_c` computes the same thing as all the other functions. To understand why, we need to look at the polynomial used by the `crc` instruction (the CRC32-C polynomial): This polynomial can be expressed as the product of two separate polynomials: Accordingly, CRC32-C isn't really a 32-bit checksum, it is really two separate things packed into 32 bits: a 1-bit checksum and a 31-bit checksum. The method of packing is slightly funky, and thus the method of unpacking is also slightly funky: from a mathematical point of view, to get the 1-bit checksum out of the 32-bit value, we need to view the 32-bit value as a Z2 polynomial and then compute the remainder modulo `x + 1`. This is equivalent to iteratively applying the reduction rule `x = 1` until nothing more can be reduced. If we were to start with the 4-bit polynomial `b3 * x3 + b2 * x2 + b1 * x + b0` and iteratively apply `x = 1`, we'd get `b3 * 1*1*1 + b2 * 1*1 + b1 * 1 + b0`, which is just `b3 + b2 + b1 + b0`, which is just another way of saying `popcnt` (mod 2). The same thing is true starting with a 32-bit polynomial: the remainder modulo `x + 1` is just `popcnt` (mod 2). In other words, we can use `popcnt` to unpack the 1-bit checksum out of the 32-bit value, and a 1-bit checksum is just parity. Hence why `parity_c` computes the same thing as all the other functions.

This combination of a 1-bit checksum and a 31-bit checksum can be seen as a good thing for checksumming purposes, as the two parts protect against different problems. Notably, every bit-flip in the data being checksummed will invert the 1-bit checksum, and hence any odd number of bit-flips will cause the 1-bit checksum to mismatch. The (ahem) flip-side is that the 1-bit checksum will match for any even number of bit-flips, which is where the 31-bit checksum comes in: it needs to protect against as many even number of bit-flips as possible.