Under 64 bit x86 CPU normally we load number -1 in to register like:
mov rdx, -1 // 48BAFFFFFFFFFFFFFFFF
... this opcode takes 10 bytes.
Another way is:
xor rdx, rdx // 4831D2
dec rdx // 48FFCA
... this opcode takes only 6 bytes.
EDIT:
As Jens Björnhager say (I have tested) xor edx, edx
opcode should clear whole rdx register:
xor edx, edx // 31D2
dec rdx // 48FFCA
... this opcode takes only 5 bytes.
EDIT:
Alex find another solution:
mov rdx, -1 // 48C7C2FFFFFFFF
... this opcode takes only 7 bytes. But how to tell compiler to use shorter opcode (without using DB)?
...
What is faster and what is more economical?
There's a shorter one than all of ones mentioned: 4883CAFF OR rdx,-1
It has the nasty property of having a false dependency on all architectures I know of, but it shouldn't go unmentioned IMO. There are legitimate reasons to use it. For example if the result is not needed until quite a lot later, and it's in a loop which would otherwise not fit in four 16byte blocks. Also, if speed is of no big concern for a particular piece of code, one might as well not waste precious cache space. It could also be used for alignment reasons, but it would almost certainly be faster to pad to the next higher alignment instead.
As for telling the compiler this, I haven't got a clue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With