Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusion about bsr and lzcnt

Tags:

x86

assembly

bmi

I'm a bit confused about both instructions. First let's discard the special case when the scanned value is 0 and the undefined/bsr or bitsize/lzcnt result - this difference is clear and not part of my question.

Let's take the binary value 0001 1111 1111 1111 1111 1111 1111 1111

According to Intel's spec the result for lzcnt is 3

According to Intel's spec the result for bsr is 28

lzcnt counts, bsr returns the index or distance from bit 0 (which is the lsb).

How can both instructions be the same and how can lzcnt be emulated as bsr in case there's no BMI on the CPU available? Or is bit 0 in case of bsr the msb? Both "code operations" in Intel's spec are different too, one counts or indexes from the left, the other from the right.

Maybe someone can shed some light on this, I have no CPU without BMI/lzcnt instruction to test if the fallback to bsr works with the same result (as the special case of value 0 to scan never happens).

like image 718
hopperpl Avatar asked Sep 05 '14 10:09

hopperpl


2 Answers

LZCNT gives the number of leading zero bits. BSR gives the bit index of the most significant 1 bit. So they do effectively the same thing for the non-zero case, except the result is interpreted differently. Therefore you can just subtract the BSR result from 31 to get the same behaviour as with LZCNT, i.e. LZCNT == (31 - BSR).

like image 91
Paul R Avatar answered Nov 06 '22 11:11

Paul R


To be clear, there is no working fallback from lzcnt to bsr. What happened is that Intel used the previously redundant sequence rep bsr to encode the new lzcnt instruction. Using a redudant rep prefix for bsr was generally defined to be ignored, but with the caveat that it may decode differently on future CPUs1.

So if you happen to execute lzcnt on a CPU that doesn't support it, it will execute as bsr. Of course, this fallback is not exactly intentional, and it gives the wrong result (as Paul R points out they look at the same bit but report it differently): it is just a consequence of the way the new instruction was encoded and how pointless rep prefixes were treated by prior CPUs. So the world fallback is pretty much entirely inappropriate for lzcnt and bsr.

The situation is more subtle for the case of tzcnt and bsf. It uses the same encoding trick: tzcnt has the same encoding as rep bsf, but here the "fallback" mostly works since tzcnt returns the same value as bsf for all inputs except zero. For zero inputs tzcnt returns 32, but bsf leaves the destination undefined.

You can't really use even this fallback though: if you never have zero inputs you might as well just use bsf, saving a byte and being compatible with a couple decades of CPUs, and if you do have zero inputs the behavior differs.

So the behavior is perhaps better classified as trivia than a fallback...


1 Normally this would more or less be esoterica, but you could for example use rep prefixes are where they have no functional effect to lengthen instructions to help align subsequent code without inserting explicit nop instructions. Given the "may decode differently in the future" this would be dangerous when compiling code which may run on any future CPU.

like image 12
BeeOnRope Avatar answered Nov 06 '22 10:11

BeeOnRope