Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there some benefit in the following assembly commands?

In our system's programming classes, we're being taught assembly language. In most of the sample programs our prof. has shown in classes; he's using:

XOR CX, CX

instead of

MOV CX, 0

or

OR AX, AX
JNE SOME_LABEL

instead of

CMP AX, 0
JNE SOME_LABEL

or

AND AL, 0FH        ; To convert input ASCII value to numeral
; The value in AL has already been checked to lie b/w '0' and '9'

instead of

SUB AL, '0'

My question is the following, is there some kind of better performance when using the AND/OR or XOR instead of the alternate (easy to understand/read) method?

Since these programs are generally shown to us during theory lecture hours, most of the class is unable to actually evaluate them verbally. Why spend 40 minutes of lecture explaining these trivial statements?

like image 909
hjpotter92 Avatar asked Dec 25 '22 22:12

hjpotter92


2 Answers

XOR CX, CX  ;0x31 0xC9

Uses only two bytes: opcode 0x31 and ModR/M byte that stores source and destination register (in this case these two are same).

MOV CX, 0  ;0xB8 0x08 0x00 0x00

Needs more bytes: opcode 0xB8, ModR/M for destination (in this case CX) and two byte immediate filled with zeroes. There is no difference from clocking perspective (both take only one clock), but mov needs 4 bytes while xor uses only two.

OR AX, AX  ;0x0A 0xC0

again uses only opcode byte and ModRM byte, while

CMP AX, 0  ;0x3D 0x00 0x00 <-- but usually 0x3B ModRM 0x00 0x00

uses three or four bytes. In this case it uses three bytes (opcode 0x3D, word immediate representing zero) because x86 has special opcodes for some operations with Accumulator register, but normally it would use four bytes (opcode, ModR/M, word immediate). It's again the same when talking about CPU clocks.

There's no difference to processor when executing

AND AL, 0x0F  ;0x24 0x0F  <-- again special opcode for Accumulator

and

SUB AL, '0'  ;0x2D 0x30 0x00  <-- again special opcode for Accumulator

(only one byte difference), but when you substract ASCII zero, you can't be sure that there won't remain value greater than 9 in Accumulator. Also anding sets OF and CF to zero, while sub sets them according to the result ANDing can be safer, but my personal opinion is that this usage depends on context.

like image 153
user35443 Avatar answered Dec 31 '22 14:12

user35443


Apart from code size savings mentioned in the other answers, I thought I'd mention a few more things which you can read more about in Intel's optimization manual and Agner Fog's x86 optimization guide:

XOR REG,REG and SUB REG,REG (with REG being the same for both operands) are recognized by modern x86 processors as dependency breakers; meaning that they also serve a purpose in breaking false dependencies on previous register/flag values. Note that this doesn't necessarily apply if you clear an 8- or 16-bit register, but it will if you clear a 32-bit register.


OR AX, AX
JNE SOME_LABEL

I believe the preferred instruction would be TEST AX,AX. TEST can be macro-fused with any conditional jump (basically combined with the jump instruction into a single instruction prior to decoding) on modern x86 processors. CMP can only be fused with unsigned conditional jumps, at least prior to the Nehalem architecture. Again, I'm not sure if this is the case for 16-bit operands.

like image 23
Michael Avatar answered Dec 31 '22 15:12

Michael