Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting and clearing the zero flag in x86

What's the most efficient way to set and also to clear the zero flag (ZF) in x86-64?

Methods that work without the need for a register with a known value, or without any free registers at all are preferred, but if a better method is available when those or other assumptions are true it is also worth mentioning.

like image 475
BeeOnRope Avatar asked Feb 03 '19 02:02

BeeOnRope


People also ask

How do you reset a Zero flag?

Zero Flag (Z) – After any arithmetical or logical operation if the result is 0 (00)H, the zero flag becomes set i.e. 1, otherwise it becomes reset i.e. 0. 00H zero flags is 1.

How do you clear a flag in assembly?

The best way to clear the carry flag is to use the CLC instruction; and the best way to set the carry flag is to use the STC instruction.

What is Zero flag in assembly language?

– Zero flag (set when the result of an operation is zero). – Carry flag (set when the result of unsigned arithmetic is too large for the destination operand or when subtraction requires a borrow). – Sign flag (set when the high bit of the destination operand is set indicating a negative result).

What are the flags in x86?

The FLAGS register is the status register that contains the current state of a x86 CPU. The size and meanings of the flag bits are architecture dependent. It usually reflects the result of arithmetic operations as well as information about restrictions placed on the CPU operation at the current time.


2 Answers

ZF=0

This is harder. cmp between any two regs known to be not equal. Or cmp reg,imm with any value some reg couldn't possibly have. e.g. cmp reg,1 with any known-zero register.

In general test reg,reg is good with any known-non-0 register value, e.g. a pointer.
test rsp, rsp is probably a good choice, or even test esp, esp to save a byte will work except if your stack is in the unusual location of spanning the 4G boundary.

I don't see a way to create ZF=0 in one instruction without a false dependency on some input reg. xor eax,eax / inc eax or dec will do the trick in 2 uops if you don't mind destroying a register, breaking false dependencies. (not doesn't set FLAGS, and neg will just do 0-0 = 0.)

or eax, -1 doesn't need any pre-condition for the register value. (False dependency, but not a true dependency so you can pick any register even if it might be zero.) It doesn't have to be -1, it's not gaining you anything so if you can make it something useful so much the better.

or eax,-1 FLAG results: ZF=0 PF=1 SF=1 CF=0 OF=0 (AF=undefined).

If you need to do this in a loop, you can obviously set up for it outside the loop, if you can dedicate a register to being non-zero for use with test.


ZF=1

Least destructive: cmp eax,eax - but has a false dependency (I assume) and needs a back-end uop: not a zeroing idiom. RSP doesn't usually change much so cmp esp, esp could be a good choice. (Unless that forces a stack-sync uop).

Most efficient: xor-zeroing (like xor eax,eax using any free register) is definitely the most efficient way on SnB-family (same cost as a 2-byte nop, or 3-byte if it needs a REX because you want to zero one of r8d..r15d): 1 front-end uop, zero back-end uops on SnB-family, and the FLAGS result is ready in the same cycle it issues. (Relevant only in case the front-end was stalled, or some other case where a uop depending on it issues in the same cycle and there aren't any older uops in the RS with ready inputs, otherwise such uops would have priority for whichever execution port.)

Flag results: ZF=1 PF=1 SF=0 CF=0 OF=0 (AF=undefined). (Or use sub eax,eax to get well-defined AF=0. In practice modern CPUs pick AF=0 for xor-zeroing, too, so they can decode both zeroing idioms the same way. Silvermont only recognizes 32-bit operand-size xor as a zeroing idiom, not sub.)

xor-zero is very cheap on all other uarches as well, of course: no input dependencies, and doesn't need any pre-existing register value. (And thus doesn't contribute to P6-family register-read stalls). So it will be at worst tied with anything else you could do on any other uarch (where it does require an execution unit.)

(On early P6-family, before Pentium M, xor-zeroing does not break dependencies; it only triggers the special al=eax state that avoids partial-register stuff. But none of those CPUs are x86-64, all 32-bit only.)

It's pretty common to want a zeroed register for something anyway, e.g. as a sub destination for 0 - x to copy-and-negate, so take advantage of it by putting the xor-zeroing where you need it to also create a useful FLAG condition.

Interesting but probably not useful: test al, 0 is 2 bytes long. But so is cmp esp,esp.


As @prl suggested, cmp same,same with any register will work without disturbing a value. I suspect this is not special-cased as dependency breaking the way sub same,same is on some CPUs, so pick a "cold" register. Again 2 or 3 bytes, 1 uop. It can micro-fuse with a JCC, but that would be dumb (unless the JCC is also a branch target from some other condition?)

Flag results: same as xor-zeroing.

Downsides:

  • (probably) false dependency
  • on P6-family can contribute to a register-read stall, so pick a cold register you're already reading in nearby instructions.
  • needs a back-end execution unit on SnB-family

Just for fun, other as-cheap alternatives include test al, 0. 2 bytes for AL, 3 or 4 bytes for any other 8-bit register. (REX) + opcode + modrm + imm8. The original register value doesn't matter because an imm8 of zero guarantees that reg & 0 = 0.


If you happen to have a 1 or -1 in a register you can destroy, 32-bit mode inc or dec would set ZF in only 1 byte. But in x86-64 that's at least 2 bytes. Nothing comes to mind for a 1-byte instruction in 64-bit mode that's actually efficient and sets FLAGS.


ZF=!CF

sbb same,same can set ZF=!CF (leaving CF unmodified), and setting the reg to 0 (CF=0) or -1 (CF=1). On AMD since Bulldozer (BD-family and Zen-family), this has no dependency on the GP register, only CF. But on other uarches it's not special cased and there is a false dep on the reg. And it's 2 uops on Intel before Broadwell.


ZF=bool(integer register)

To set ZF=integer_reg, obviously the normal test reg,reg is your best bet. (Better than and reg,reg or or reg,reg, unless you're intentionally rewriting the register to avoid P6 register-read stalls.)


Other FLAGS conditions:

  • How to read and write x86 flags registers directly?
  • One instruction to clear PF (Parity Flag) -- get odd number of bits in result register (not possible without pre-existing register values for test or cmp).
  • How can I set or clear overflow flag in x86 assembly? (e.g. for the start of an ADOX chain.)
  • pushf/pop rax is not terrible, but writing flags with popf is very slow (e.g. 1/20c throughput on SKL). It's microcoded because flags like IF also live in EFLAGS, and there isn't a condition-codes-only version or a special fast-path for user-space. (Or maybe 20c is the fast path.)
  • lahf (FLAGS->AH) / sahf (AH->FLAGS) can be useful but miss OF.

CF has clc/stc/cmc instructions. (clc is as efficient as xor-zeroing on SnB-family.)

like image 52
Peter Cordes Avatar answered Nov 26 '22 08:11

Peter Cordes


Assuming you don’t need to preserve the values of the other flags,

cmp eax, eax
like image 34
prl Avatar answered Nov 26 '22 06:11

prl