What's the most efficient way to set and also to clear the zero flag (ZF) in x86-64?
Methods that work without the need for a register with a known value, or without any free registers at all are preferred, but if a better method is available when those or other assumptions are true it is also worth mentioning.
Zero Flag (Z) – After any arithmetical or logical operation if the result is 0 (00)H, the zero flag becomes set i.e. 1, otherwise it becomes reset i.e. 0. 00H zero flags is 1.
The best way to clear the carry flag is to use the CLC instruction; and the best way to set the carry flag is to use the STC instruction.
– Zero flag (set when the result of an operation is zero). – Carry flag (set when the result of unsigned arithmetic is too large for the destination operand or when subtraction requires a borrow). – Sign flag (set when the high bit of the destination operand is set indicating a negative result).
The FLAGS register is the status register that contains the current state of a x86 CPU. The size and meanings of the flag bits are architecture dependent. It usually reflects the result of arithmetic operations as well as information about restrictions placed on the CPU operation at the current time.
This is harder. cmp
between any two regs known to be not equal. Or cmp reg,imm
with any value some reg couldn't possibly have. e.g. cmp reg,1
with any known-zero register.
In general test reg,reg
is good with any known-non-0 register value, e.g. a pointer.test rsp, rsp
is probably a good choice, or even test esp, esp
to save a byte will work except if your stack is in the unusual location of spanning the 4G boundary.
I don't see a way to create ZF=0 in one instruction without a false dependency on some input reg. xor eax,eax
/ inc eax
or dec
will do the trick in 2 uops if you don't mind destroying a register, breaking false dependencies. (not
doesn't set FLAGS, and neg
will just do 0-0 = 0.)
or eax, -1
doesn't need any pre-condition for the register value. (False dependency, but not a true dependency so you can pick any register even if it might be zero.) It doesn't have to be -1
, it's not gaining you anything so if you can make it something useful so much the better.
or eax,-1
FLAG results: ZF=0 PF=1 SF=1 CF=0 OF=0 (AF=undefined).
If you need to do this in a loop, you can obviously set up for it outside the loop, if you can dedicate a register to being non-zero for use with test
.
Least destructive: cmp eax,eax
- but has a false dependency (I assume) and needs a back-end uop: not a zeroing idiom. RSP doesn't usually change much so cmp esp, esp
could be a good choice. (Unless that forces a stack-sync uop).
Most efficient: xor-zeroing (like xor eax,eax
using any free register) is definitely the most efficient way on SnB-family (same cost as a 2-byte nop
, or 3-byte if it needs a REX because you want to zero one of r8d..r15d): 1 front-end uop, zero back-end uops on SnB-family, and the FLAGS result is ready in the same cycle it issues. (Relevant only in case the front-end was stalled, or some other case where a uop depending on it issues in the same cycle and there aren't any older uops in the RS with ready inputs, otherwise such uops would have priority for whichever execution port.)
Flag results: ZF=1 PF=1 SF=0 CF=0 OF=0 (AF=undefined). (Or use sub eax,eax
to get well-defined AF=0. In practice modern CPUs pick AF=0 for xor-zeroing, too, so they can decode both zeroing idioms the same way. Silvermont only recognizes 32-bit operand-size xor as a zeroing idiom, not sub.)
xor-zero is very cheap on all other uarches as well, of course: no input dependencies, and doesn't need any pre-existing register value. (And thus doesn't contribute to P6-family register-read stalls). So it will be at worst tied with anything else you could do on any other uarch (where it does require an execution unit.)
(On early P6-family, before Pentium M, xor
-zeroing does not break dependencies; it only triggers the special al=eax state that avoids partial-register stuff. But none of those CPUs are x86-64, all 32-bit only.)
It's pretty common to want a zeroed register for something anyway, e.g. as a sub
destination for 0 - x
to copy-and-negate, so take advantage of it by putting the xor-zeroing where you need it to also create a useful FLAG condition.
Interesting but probably not useful: test al, 0
is 2 bytes long. But so is cmp esp,esp
.
As @prl suggested, cmp same,same
with any register will work without disturbing a value. I suspect this is not special-cased as dependency breaking the way sub same,same
is on some CPUs, so pick a "cold" register. Again 2 or 3 bytes, 1 uop. It can micro-fuse with a JCC, but that would be dumb (unless the JCC is also a branch target from some other condition?)
Flag results: same as xor-zeroing.
Downsides:
Just for fun, other as-cheap alternatives include test al, 0
. 2 bytes for AL, 3 or 4 bytes for any other 8-bit register. (REX) + opcode + modrm + imm8. The original register value doesn't matter because an imm8
of zero guarantees that reg & 0 = 0
.
If you happen to have a 1
or -1
in a register you can destroy, 32-bit mode inc
or dec
would set ZF in only 1 byte. But in x86-64 that's at least 2 bytes. Nothing comes to mind for a 1-byte instruction in 64-bit mode that's actually efficient and sets FLAGS.
sbb same,same
can set ZF=!CF (leaving CF unmodified), and setting the reg to 0 (CF=0) or -1 (CF=1). On AMD since Bulldozer (BD-family and Zen-family), this has no dependency on the GP register, only CF. But on other uarches it's not special cased and there is a false dep on the reg. And it's 2 uops on Intel before Broadwell.
To set ZF=integer_reg, obviously the normal test reg,reg
is your best bet. (Better than and reg,reg
or or reg,reg
, unless you're intentionally rewriting the register to avoid P6 register-read stalls.)
test
or cmp
).pushf
/pop rax
is not terrible, but writing flags with popf
is very slow (e.g. 1/20c throughput on SKL). It's microcoded because flags like IF also live in EFLAGS, and there isn't a condition-codes-only version or a special fast-path for user-space. (Or maybe 20c is the fast path.)lahf
(FLAGS->AH) / sahf
(AH->FLAGS) can be useful but miss OF.CF has clc
/stc
/cmc
instructions. (clc
is as efficient as xor-zeroing on SnB-family.)
Assuming you don’t need to preserve the values of the other flags,
cmp eax, eax
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With