Control flow guard

Microsoft recently introduced a new security feature called Control Flow Guard. At a basic level, this feature consists of a massive bit vector, and before any indirect function call is performed, the bit vector is consulted to determine whether the target of the call is valid or not. The end-goal is that the bit vector should specify all function entry addresses as valid, and all other addresses as invalid - thereby preventing malicious calls into the middle of functions. The structure of the bit vector is interesting, but most literature seems to get it wrong. For example, the Trend Micro report on Control Flow Guard states:

The status of every 8 bytes in the process space corresponds to a bit in CFGBitmap. If there is a function starting address in each group of 8 bytes, the corresponding bit in CFGBitmap is set to 1; otherwise it is set to 0.

Every bit in the CFGBitmap represents eight bytes in the process space. So if an invalid target call address has less than eight bytes from the valid function address, the CFG will think the target call address is "valid."

Meanwhile, a POC2014 conference talk states:

One bit indicates 8bytes address and actually in most cases 16bytes
Every guard function address needs to be aligned to 0x10
If function address is not aligned to 0x10, it will use the odd bit only

In the bit vector (which Trend Micro calls CFGBitmap), every two bits correspond to sixteen bytes. Therefore, on average, every bit corresponds to eight bytes. However, the average is very misleading here, as in reality one of those two bits corresponds to one byte, and the other bit corresponds to fifteen bytes. This arrangement has several benefits:

On average, only one bit is required for every eight bytes.
If functions are aligned to sixteen byte boundaries (as is common), then the bit vector can perfectly and exactly represent the set of valid function entry addresses (by marking just the one byte as valid).
If functions aren't aligned to sixteen byte boundaries, then the bit vector still has some benefit.
Computing the bit index corresponding to an address remains relatively fast (do some bit shifting, then conditionally do | 1 if the address isn't 8 byte aligned).