In our system's programming classes, we're being taught assembly language. In most of the sample programs our prof. has shown in classes; he's using:
XOR CX, CX
instead of
MOV CX, 0
or
OR AX, AX
JNE SOME_LABEL
instead of
CMP AX, 0
JNE SOME_LABEL
or
AND AL, 0FH ; To convert input ASCII value to numeral
; The value in AL has already been checked to lie b/w '0' and '9'
instead of
SUB AL, '0'
My question is the following, is there some kind of better performance when using the AND
/OR
or XOR
instead of the alternate (easy to understand/read) method?
Since these programs are generally shown to us during theory lecture hours, most of the class is unable to actually evaluate them verbally. Why spend 40 minutes of lecture explaining these trivial statements?
XOR CX, CX ;0x31 0xC9
Uses only two bytes: opcode 0x31
and ModR/M byte that stores source and destination register (in this case these two are same).
MOV CX, 0 ;0xB8 0x08 0x00 0x00
Needs more bytes: opcode 0xB8
, ModR/M for destination (in this case CX) and two byte immediate filled with zeroes.
There is no difference from clocking perspective (both take only one clock), but mov
needs 4 bytes while xor
uses only two.
OR AX, AX ;0x0A 0xC0
again uses only opcode byte and ModRM byte, while
CMP AX, 0 ;0x3D 0x00 0x00 <-- but usually 0x3B ModRM 0x00 0x00
uses three or four bytes. In this case it uses three bytes (opcode 0x3D
, word immediate representing zero) because x86 has special opcodes for some operations with Accumulator register, but normally it would use four bytes (opcode, ModR/M, word immediate). It's again the same when talking about CPU clocks.
There's no difference to processor when executing
AND AL, 0x0F ;0x24 0x0F <-- again special opcode for Accumulator
and
SUB AL, '0' ;0x2D 0x30 0x00 <-- again special opcode for Accumulator
(only one byte difference), but when you substract ASCII zero, you can't be sure that there won't remain value greater than 9
in Accumulator.
Also anding sets OF
and CF
to zero, while sub
sets them according to the result AND
ing can be safer, but my personal opinion is that this usage depends on context.
Apart from code size savings mentioned in the other answers, I thought I'd mention a few more things which you can read more about in Intel's optimization manual and Agner Fog's x86 optimization guide:
XOR REG,REG
and SUB REG,REG
(with REG
being the same for both operands) are recognized by modern x86 processors as dependency breakers; meaning that they also serve a purpose in breaking false dependencies on previous register/flag values. Note that this doesn't necessarily apply if you clear an 8- or 16-bit register, but it will if you clear a 32-bit register.
OR AX, AX
JNE SOME_LABEL
I believe the preferred instruction would be TEST AX,AX
. TEST
can be macro-fused with any conditional jump (basically combined with the jump instruction into a single instruction prior to decoding) on modern x86 processors. CMP
can only be fused with unsigned conditional jumps, at least prior to the Nehalem architecture. Again, I'm not sure if this is the case for 16-bit operands.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With