Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why any modern x86 masks shift count to the 5 low bits in CL

I'm digging into left and right shift operations in x86 ASM, like shl eax, cl

From IA-32 Intel Architecture Software Developer’s Manual 3

All IA-32 processors (starting with the Intel 286 processor) do mask the shift count to 5 bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions.

I'm trying to understand the reasoning behind this logic. Maybe it works this way because on a hardware level it is hard to implement shift for all 32 (or 64) bits in a register using 1 cycle?

Any detailed explanation would help a lot!

like image 626
No Name QA Avatar asked Dec 17 '22 13:12

No Name QA


1 Answers

Edited to correct statement re: 80386, which (to my surprise) did have a barrel shifter.


Happy to hear the 286 described as "modern" :-)

The 8086 ran a SHL AX, CL in 8 clocks + 4 clocks per bit shifted. So if CL = 255 this is a seriously slow instruction !

So the 286 did everybody a favour and clamped the count by masking to 0..31. Limiting the instruction to at most 5 + 31 clocks. Which for 16 bit registers is an interesting compromise.

[I found "80186/80188 80C186/80C188 Hardware Reference Manual" (order no. 270788-001) which says that this innovation appears there first. SHL et al ran 5+n clocks (for register operations), same like the 286. FWIW, the 186 also added PUSHA/POPA, PUSH immed., INS/OUTS, BOUND, ENTER/LEAVE, IMUL immed. and SHL/ROL etc. immed. I do not know why the 186 appears to be a non-person.]

For the 386 they kept the same mask, but that applies also to 32-bit register shifts. I found a copy of the "80386 Programmer's Reference Manual" (order no. 230985-001), which gives a clock count of 3 for all register shifts. The "Intel 80386 Hardware Reference Manual" (order no. 231732-002), section 2.4 "Execution Unit" says that the Execution Unit includes:

• The Data Unit contains the ALU, a file of eight 32-bit general-purpose registers, and a 64-bit barrel shifter (which performs multiple bit shifts in one clock).

So, I do not know why they did not mask 32-bit shifts to 0..63. At this point I can only suggest the cock-up theory of history.

I agree it is a shame that there isn't a (GPR) shift which returns zero for any count >= argument size. That would require the hardware to check for any bit set beyond the bottom 6/5, and return zero. As a compromise, perhaps just the Bit6/Bit5.

[I haven't tried it, but I suspect that using PSLLQ et al is hard work -- shuffling count and value to xmm and shuffling the result back again -- compared to testing the shift count and masking the result of a shift in some branch-free fashion.]

Anyway... the reason for the behaviour appears to be history.

like image 194
Chris Hall Avatar answered Dec 31 '22 13:12

Chris Hall