What is a circular shift with extend used for?

Tags:

I remember in an assembly class, we learned the m68k processor, and there were 3 kinds of shifts you could do. Linear shift, circular shift, and circular shift with extend.

The last one, circular shift with extend, basically rotates all bits left or right but it puts the outermost one into an extend bit before it would be moved to the beginning again (if you ever shifted by 1 again).

I drew a little pic:

enter image description here

Basically, a 33rd bit is used in the circular shifting, but doesn't show up in the 32-bit word of course. The 33rd bit is the X flag of the processor, which stands for extend. You could just as easily use any status flag, such as the carry flag, but I guess the motorola guys wanted to preserve that flag so it doesn't get overwritten, in case you need the carry flag for its normal duties in the middle of some algorithm that also needs rotate with extend.

Anyway, what is the purpose of rotate with extend? What is it used for? What is it needed for? It seems so strange. Why in the world do you need a 33rd bit?

I've read this and this, two related questions, but they didn't talk about the circular shift with extend.

I'm aware of some uses for normal shifts. Basically to divide by two, or test divisibility, and permutate the bits for randomness. Stuff like that. But I cannot think of why you would need some extended bit inserted into the rotate but doesn't come out in the result.

EDIT: I'm interested in any use of it, modern or old, and doesn't matter if it's on the m68k or not. The m68k is just the first placed I encountered it (and I never even used it there).

781

asked Dec 03 '17 18:12

DrZ214

2 Answers

Suppose you want to shift a 32 bit word to the right but all you have are 16 bit registers. To do this you have to shift the two 16 bit parts of the 32 bit word to the right and transfer the bit that was shifted out of the high word into the low word.

If you just have logical shifts, this is cumbersome to do as you have to manually fix up that bit. The rotate-through-carry instruction allows you to keep the bit that needs to be transferred in the carry flag and shift it in in one go. The rotate-through-carry instruction places the bit shifted out in the carry flag so you can easily chain this together to right-shift arbitrarily sized data.

178

answered Sep 29 '22 14:09

fuz

On x86 (and most architectures that have this instruction), the extra bit is the carry flag, and lots of things can set that flag. Rotate-through-carry left or right lets you shift the carry bit back into some other register. Interesting that m68k uses a different flag for extended-rotate.

I'm not very familiar with m68k anymore, so I'll mostly talk about other arches. (But apparently that's what you want :)

Instructions like this are often available on microcontrollers that are much less capable than x86 or m68k. Or with limited opcode space (and decoding complexity), some CPUs only have a rotate-through-carry by 1 and not regular shift instructions. If you want to shift in a zero, make sure the flag is clear first.

8051 is like this: only rotate left/right by 1, and rotate-with-carry left/right by 1, not shift. See rlc in the ISA ref manual. When possibly, avoid the clr instruction when you want to shift in a zero by putting rlc after something else that leaves Carry cleared.

I think it's typical for extended circular shift to use the carry flag, rather than its own X bit the way m68k does.

Anyway, extended-precision rotate is kind of traditional / expected for CPUs, but has more uses on more limited CPUs.

For a register, rcl reg, 1 is the same operation as adc reg,reg: shift the old contents left by 1, and set the low bit to CF. The bit shifted out by the rotate or adc becomes the new value of CF. So RCL is only a non-redundant part of an instruction set if it's available with a memory operand, or (for weird cases) with a count greater than 1. (The rotate-right version is not redundant, though.)

IDK why you'd ever use a count > 1. On modern x86, rotate-through-carry is fairly fast with count=1, but definitely slow with a variable count or fixed count>1. IDK which way the chicken/egg problem went: CPU designers didn't make it fast because nobody used it anyway, or people stopped using it because it's slow. But probably the former because I don't remember ever seeing a use mentioned for rotate-through-carry by more than 1 bit.

For extended-precision shifts, x86 has a double-shift instruction (shld / shrd dst, src, count) that shifts dst, shifting in bits from src instead of zeros or copies of the sign bit. It doesn't work with 2 memory operands, so an extended-precision shift loop has to load and store registers with separate instructions. That's more code size than a loop using rcr dword [edi], 1 / sub edi, 4, but code size is rarely a problem on modern x86 and doing the loads/stores with separate instructions isn't slower.) More importantly, shrd shifts multiple bits at once, so you can loop over an array once to implement a multi-precision shift by 2 or more bits.

Extended rotate can only shift one bit at a time between registers, because it uses a 1-bit storage space (in flags). I think on m68k if you did want to shift multiple bits between registers, you'd probably copy a register and use regular shifts + OR to combine. (Or a rotate and AND/OR to split up the bits.)

On AMD CPUs, shld/shrd is slower than rcl/rcr-by-1, but it's the opposite on Intel CPUs. (http://agner.org/optimize/).

I can't really think of any use cases other than shifting bits between registers. Maybe if you shift a bit out, then branch on something that might set or clear the X bit, then shift the bit back in, you could use an extended rotate to do something to the low or high bit? But you could usually do the same more easily with AND, OR, or XOR with a constant.

answered Sep 29 '22 14:09

Peter Cordes

Related questions
                            
                                Accessing a label from outside a function
                            
                                MIPS (curiosity) faster way of clearing a register?
                            
                                How would one programmatically simulate lower processor frequencies?
                            
                                How to make string input in Assembly language?
                            
                                Using Assembly On Mac
                            
                                Accessing arbitrary 16-bit elements packed in a 128-bit register
                            
                                Program Counter?
                            
                                On iOS how to quickly convert RGB24 to BGR24?
                            
                                PTX "bit bucket" registers
                            
                                Using global variables declared in C, in Arm assembly
                            
                                What is the effect of using JMP on a function address?
                            
                                SSE Bilinear interpolation
                            
                                Allocating a new call stack
                            
                                The END directive in assembly language
                            
                                How to step over interrupt calls when debugging a bootloader/bios with gdb and QEMU?
                            
                                How did an assember for x86 get a label's address defined after jmp instructions referenced to it?
                            
                                Can I write multiple Assembly instructions on the same line?
                            
                                Reverse engineer optimized c code from assembly
                            
                                Division causes unbalanced parentheses
                            
                                Are addresses in x86 assembly virtual or physical?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a circular shift with extend used for?

Tags:

cpu-architecture

assembly

bit-shift

instruction-set

68000

DrZ214

People also ask

2 Answers

fuz

Peter Cordes

Recent Activity

Donate For Us