Basically, as title says. I would like to know of a way to compute floor(log2(x / y))
, where x
and y
are nonzero unsigned machine integers, in as few cycles as possible (avoiding as much as possible the use of things like branches, memory bandwidth, division, etc that are expensive in tiny sections of code like this). The exact (integer) answer is required here. I was thinking about how to optimize the outer loop of Adaptive Shivers Sort by computing this efficiently, as it requires computing floor(log2(r / c))
, where r
is a run length and c
a metaparameter to the algorithm; solutions that assume x <= y
would work for the offline version of this sort, where c
is chosen to be equal to the length of the input, but general solutions could be useful in other settings.
You can assume the use of PopCount
and CountLeadingZeros/CountTrailingZeros
, common SSE-style instructions, or even floating point computation--but it needs to be something which processors can do in just a few cycles.
How about something like this, inspired partly by a comment by NXTangl? Apply clz
to both x
and y
and shift them both so their leading bit is in the top bit position (31 or 63). Let k
be the difference between these two shift amounts. Now either k
or k-1
is the result you're looking for, and you can distinguish the cases by which of the shifted value is greater.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With