Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a shift using the CL register result in a partial register stall?

Can a variable shift generate a partial register stall (or register recombining µops) on ecx? If so, on which microarchitecture(s)?

I have tested this on Core2 (65nm), which seems to read only cl.

_shiftbench:
    push rbx
    mov edx, -10000000
    mov ecx, 5
  _shiftloop:
    mov bl, 5   ; replace by cl to see possible recombining
    shl eax, cl
    add edx, 1
    jnz _shiftloop
    pop rbx
    ret

Replacing mov bl, 5 by mov cl, 5 made no difference, which it would have if there was register recombining going on, as can be demonstrated by replacing shl eax, cl by add eax, ecx (in my tests the version with add experienced a 2.8x slowdown when writing to cl instead of bl).


Test results:

  • Merom: no stall observed
  • Penryn: no stall observed
  • Nehalem: no stall observed

Update: the new shrx-group of shifts in Haswell does show that stall. The shift-count argument is not written as an 8-bit register, so that might have been expected, but the textual representation really doesn't say anything about such micro-architectural details.

like image 646
harold Avatar asked Oct 27 '12 20:10

harold


1 Answers

As currently phrased (“Can a shift using the CL register …”) the question's title contains its own answer: with a modern processor, there is never a partial register stall on CL because CL can never be recombined from something smaller.

Yes, the processor knows that the amount you are shifting by is effectively contained in CL, the 5 or 6 least significant bits of CL to be precise. One way it could have stalled on ECX was if the granularity at which it considered instruction dependencies did not go below full registers. This worry is obsolete, though: the newest Intel processor that would have consider the whole ECX register as dependency was the Pentium 4. See Agner Fog's unofficial optimization manual, page 121. But then again, with the P4 this would not be called a partial register stall, the program could only be victim of a false dependency (say, if CH was modifier just before the shift).

like image 125
Pascal Cuoq Avatar answered Sep 29 '22 00:09

Pascal Cuoq