For very many years now, x86 and x64 processors have been able to take an arithmetic instruction immediately followed by a conditional jump instruction, and execute the two instructions as if they were a single instruction (which is called "macro-op fusion"). When writing very hot loops in assembly code by hand, this can effectively give you an extra instruction for free.

In Intel chips, fusion first appeared in the original Core microarchitecture, launched in 2006. An improved version of fusion made its debut in the Nehalem microarchitecture (Nhlm) a few years later in 2008. The Sandy Bridge microarchitecture (SnBr) released in 2011 contained yet more improvements to fusion. Things remained fairly stable through until Cannon Lake (CnLk), but then Ice Lake in 2019 removed support for a few cases of fusion (INC and DEC instructions, and instructions with memory operands).

In AMD chips, fusion first appeared in the Bulldozer microarchitecture (Bdzr), launched in 2011. It supported fusing TEST and CMP instructions with all flag-based conditional jumps. Things remained fairly stable through until Zen 3 in 2020, which added support for a number of other instructions.

The table below summarises which instructions can fuse with which jumps on which microarchitectures. Combinations which are supported, but which are pointless in practice, are parenthesised.


TEST

AND
OR
XOR

CMP
ADD
SUB
INC
DEC
J[N]O(Core+)(SnBr+)----
J[N]SCore+SnBr+----
J[N]P
JP[EO]
Core+SnBr+----
J[N]B
J[N]AE
J[N]C
(Core+)(SnBr+)-Core+SnBr+-
J[N]BE
J[N]A
(Core+)(SnBr+)-Core+SnBr+-
J[N]Z
J[N]E
Core+SnBr+-Core+SnBr+SnBr-CnLk
J[N]L
J[N]GE
(Core+)(SnBr+)-Nhlm+SnBr+SnBr-CnLk
J[N]G
J[N]LE
Core+SnBr+-Nhlm+SnBr+SnBr-CnLk

TEST

AND
OR
XOR

CMP
ADD
SUB
INC
DEC
J[N]O(Bdzr+)(Zen3+)(Zen3+)Bdzr+Zen3+Zen3+
J[N]SBdzr+Zen3+Zen3+Bdzr+Zen3+Zen3+
J[N]P
JP[EO]
Bdzr+Zen3+Zen3+Bdzr+Zen3+Zen3+
J[N]B
J[N]AE
J[N]C
(Bdzr+)(Zen3+)(Zen3+)Bdzr+Zen3+(Zen3+)
J[N]BE
J[N]A
(Bdzr+)(Zen3+)(Zen3+)Bdzr+Zen3+(Zen3+)
J[N]Z
J[N]E
Bdzr+Zen3+Zen3+Bdzr+Zen3+Zen3+
J[N]L
J[N]GE
(Bdzr+)(Zen3+)(Zen3+)Bdzr+Zen3+Zen3+
J[N]G
J[N]LE
Bdzr+Zen3+Zen3+Bdzr+Zen3+Zen3+
The following paragraphs describe the various conditional jumps, and in particular what it means to fuse with ADD.
Jumps which depend only upon the result bits (for TEST, the result is what AND would give, and for CMP, the result is what SUB would give):
DescriptionFlagsFusion (SnBr)
JSJump if signSF = 1TEST AND
JNSJump if not signSF = 0TEST AND
JP
JPE
Jump if parity
Jump if parity even
PF = 1TEST AND
JNP
JPO
Jump if not parity
Jump if parity odd
PF = 0TEST AND
JZ
JE
Jump if zero
Jump if equal
ZF = 1TEST AND INC DEC
CMP ADD SUB
JNZ
JNE
Jump if not zero
Jump if not equal
ZF = 0TEST AND INC DEC
CMP ADD SUB
SF is a copy of the most significant bit of the result (which for conceptually signed results, indicates a result less than zero), PF is set based on the population count of the low 8 bits of the result (PF = 1 implies that said population count was even), and ZF is set if all bits of the result are zero (which for CMP and SUB implies that the two operands were equal).
Jumps which conceptually operate on unsigned operands:
DescriptionFlagsFusion (SnBr)
JB
JNAE
JC
Jump if below
Jump if not above or equal
Jump if carry
CF = 1CMP ADD SUB
JNB
JAE
JNC
Jump if not below
Jump if above or equal
Jump if not carry
CF = 0CMP ADD SUB
JBE
JNA
Jump if below or equal
Jump if not above
CF = 1 or ZF = 1CMP ADD SUB
JNBE
JA
Jump if not below or equal
Jump if above
CF = 0 and ZF = 0CMP ADD SUB
For an N-bit result, CF is the (N+1)th bit. Fusion with TEST and AND is technically possible, but CF is always zero following these instructions, making it pointless. The "above" and "below" descriptions are from the perspective of CMP or SUB with unsigned operands. To extend the description to ADD, consider the instruction to be taking two N-bit unsigned operands, zero-extending both to (N+1) bits, and producing an (N+1)-bit signed result. Then "above" and "below" are whether this (N+1)-bit signed result is above or below zero.
Jumps which conceptually operate on signed operands:
DescriptionFlagsFusion (SnBr)
JOJump if overflowOF = 1-
JNOJump if not overflowOF = 0-
JL
JNGE
Jump if less
Jump if not greater or equal
SF ≠ OFINC DEC
CMP ADD SUB
JNL
JGE
Jump if not less
Jump if greater or equal
SF = OFINC DEC
CMP ADD SUB
JNG
JLE
Jump if not greater
Jump if less or equal
SF ≠ OF or ZF = 1TEST AND INC DEC
CMP ADD SUB
JG
JNLE
Jump if greater
Jump if not less or equal
SF = OF and ZF = 0TEST AND INC DEC
CMP ADD SUB
Conceptually, the arithmetic instructions take two N-bit signed operands (the 2nd operand being 1 for INC and DEC), sign extend both to (N+1)-bits, compute an (N+1)-bit signed result, then OF is set to the exclusive-or of the two most signicant bits of those (N+1). This means that SF ≠ OF corresponds to the most significant bit of this signed (N+1)-bit result (in the same way that CF corresponds to the most significant bit of the (N+1)-bit result when the operands are zero-extended). In other words, OF is set if sign-extending the N-bit result to (N+1)-bits gives a different result to sign-extending the operands to (N+1)-bits and then performing (N+1)-bit arithmetic. Fusion with TEST and AND is technically possible for all these jumps, but OF is always zero following TEST or AND, making many combinations pointless. The "greater" and "less" descriptions are from the perspective of CMP or SUB with signed operands. To extend the description to ADD, consider the instruction to be taking two N-bit signed operands, sign-extending both to (N+1) bits, and producing an (N+1)-bit signed result. Then "greater" and "less" are whether this (N+1)-bit signed result is greater or less than zero. The INC and DEC instructions are equivalent to ADD and SUB with the 2nd operand being 1 (except for INC and DEC not touching CF).
In the context of optimising hot loops, it is common to be iterating over the half-open range [0, N), by some step S. Given all the references above to comparing to zero, it is often beneficial to transform this range to [-N, 0), as then the loop-control instructions can be ADD reg, S followed by a conditional jump, rather than ADD reg, S followed by CMP reg, N followed by a conditional jump. In the case of fusing ADD reg, S with a conditional jump, if N is known to be non-zero, then the conditional jump can be JA or JAE (or synonyms thereof), relying on CF = 1 occuring when the counter transitions from negative to non-negative. If N might zero, and a single iteration of the loop is desired (or harmless) even when N is zero, then the conditional jump cannot be JA or JAE, but instead can be JL (or a synonym thereof), relying on sign extension of both operands to (N+1) bits and then jumping if the (N+1)-bit result is less than zero.