Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

logical shift right on signed data

Before anything, no this is not my homework, it's a lab given by a book called Computer Systems A Programmer's Perspective (Excellent book btw)

I need to perform a logical shift right on signed integers without using any the following:

  • casting
  • if, while, switch, for, do-while, ? :
  • pointers of any type

Allowed operators are: ! + ~ | >> << ^

What have I tried so far?

/* 
 * logicalShift - shift x to the right by n, using a logical shift
 *   Can assume that 0 <= n <= 31
 *   Examples: logicalShift(0x87654321,4) = 0x08765432
 *   Legal ops: ~ & ^ | + << >>
 *   Max ops: 20
 *   Rating: 3 
 */
int logicalShift(int x, int n) {
    int mask = ~0;
    int shiftAmount = 31 + ((~n)+1);//this evaluates to 31 - n on two's complement machines
    mask = mask << shiftAmount;
    mask = ~mask;//If n equals 0, it means we have negated all bits and hence have mask = 0
    x = x >> n;
    return x & mask; 
}

This works just fine as long as n doesn't equal 0, when it does this code breaks because the mask will turn to all 0s and the function will return 0.

I'd appreciate any hint in the right direction rather than a complete code.

Again, this is not a homework; the lab-assignments are publicly available here http://csapp.cs.cmu.edu/public/labs.html

PS: Not a duplicate, do not post solutions that involve casting to unsigned and then shifting.

like image 380
Fingolfin Avatar asked Nov 04 '12 18:11

Fingolfin


People also ask

What does a logical shift right do?

Logical Right Shifts When shifting right with a logical right shift, the least-significant bit is lost and a 0 is inserted on the other end. For positive numbers, a single logical right shift divides a number by 2, throwing out any remainders.

Is the signed right shift operator?

The right shift operator ( >> ) returns the signed number represented by the result of performing a sign-extending shift of the binary representation of the first operand (evaluated as a two's complement bit string) to the right by the number of bits, modulo 32, specified in the second operand.

What is the result of right shift operator >> on 0011000 >> 2?

8) What is the result of Right Shift Operator >> on (00110000>>2).? Explanation: Right Shift Operator shift bits on the right side and fills Zeroes on the left side.

Which shift is used for signed binary number?

Shifting left by n bits on a signed or unsigned binary number has the effect of multiplying it by 2n. Shifting right by n bits on a two's complement signed binary number has the effect of dividing it by 2n, but it always rounds down (towards negative infinity).


2 Answers

Cheesy answer that clearly violates the spirit of the question: casting1 isn't the only way to do type conversion in C or C++

return (x | 0U) >> n;    // assumes int and unsigned are the same width

(Godbolt for x86-64 GCC). Note that it also relies on implementation-defined behaviour for n==0 with negative x, because that leaves the sign bit set. But on normal 2's complement implementations the conversion between unsigned and int simply preserves the bit-pattern, as well as the value of positive ints. Making the type-pun explicit with memcpy instead of using implicit type-conversion could make it more robust. (Since & address-of isn't allowed by the rules, C++20 std::bit_cast<int> of the shift result could work without doing that. But it has cast in the name. Union type-punning is well-defined in C99, but not ISO C++. Only as an extension by many mainstream C++ implementations.)

The | operator uses the rules for the usual arithmetic conversions to make both operands the same type. We can use this to implicitly convert to unsigned, rules-lawyering around the artificial restriction on casting.

0U numeric literal suffix is not a cast, it's just plain language syntax for an unsigned constant. An alternative is x & UINT_MAX or x | (~UINT_MAX). Or even more fun, if you know the width of int and it's less than or equal to unsigned, you can write a constant like 0xffffffff that's too large for a positive signed int, and it will have type unsigned if it fits in that, or long or long long or even the unsigned versions of those if it's larger.

Of course, unsigned tmp = x; is also not a cast, so really just do that instead! But this is even more clearly just a silly rules-lawyering to find a loophole in question, not what was intended.


Assumptions:

  • This might only work for 2's complement, where conversion from signed to unsigned doesn't modify the bit-pattern of the object representation. (Related: How can an (int) be converted to (unsigned int) while preserving the original bit pattern? - conversion preserves values, not bit-patterns, before applying the modulo-reduction to unsigned.)

  • unsigned int is the same width as int, so we don't lose any bits, and don't sign-extend to introduce high 1 bits before a logical right shift.**

e.g. (x | 0ULL) >> n doesn't work; that would sign-extend x to 64-bit (or whatever) before a logical right shift, so the low int-width of the result would effectively have copies of the sign bit shifted in. (Assuming 2's complement.)

Workaround: use an unsigned bitfield with the exact number of non-padding bits as int. Signed integer types have numeric_limits<T>::digits non-sign bits, and 1 sign bit. (And unknown amounts of padding). This will require assignment, not just using it with |.

struct uint { 
   unsigned long u : (1 + std::numeric_limits<int>::digits);
}

If you want to memcpy into this struct instead of assign (e.g. to make sure the bit-pattern doesn't change before right shifting, on non-2's-complement systems?) you could pad with something like char pad[sizeof(int)];. That's definitely overkill, more padding than necessary to make it at least as large as int (which is required for memcpy(&struct_tmp, &x, sizeof(x)) or the other direction.). I'm not sure if the struct is guaranteed to be at least as wide as an unsigned long because we used that as the base type for the bitfield, I'd have to double-check that part of the standard, too. It is the case for GCC and MSVC, though; unsigned long long brings its sizeof up to 8. IIRC, unsigned long is guaranteed to be at least as wide as int.

If we wanted to calculate the exact amount of padding needed for a char[] array, we'd have to deal with the possibility of sizeof(int) - sizeof(unsigned) being negative. But calculating the number of extra bits for another padding bitfield member could work, calculating padding as CHAR_BIT * sizeof(int) - (1 + digits)

In C99 we could use union type punning to replace memcpy, but we still probably need a bitfield to make sure we truncate an unsigned value to the right width exactly.


Footnote 1: ISO C++ 7.6.3 Explicit type conversion (cast notation)[expr.cast] says:

2: An explicit type conversion can be expressed using functional notation, a type conversion operator (dynamic_­cast, static_­cast, reinterpret_­cast, const_­cast), or the cast notation.

    cast-expression:
        unary-expression
        ( type-id ) cast-expression

The name of this whole section is [expr.cast], so it's reasonable to treat any of these syntaxes for type conversion as casts, not just a "cast expression". Fortunately there's no need to try to justify unsigned(x) >> n not including a cast.

like image 69
Peter Cordes Avatar answered Oct 06 '22 19:10

Peter Cordes


This question is the first result searching for logical shift in C++.

Therefore, it makes sense also to answer the general case, where cast is allowed - because none of the codes shown here is compiled (GCC 9.2, -O3) with the correct (and fast) op-code (just a single shr instruction instead of sar).

Cast Version

This version works also for the upcoming int128_t (currently __int128 in GCC, Clang and ICC) and other future types. If you're allowed to use type_traits and your code should work in future without thinking what would be the correct unsigned type, you should use it for the correct cast.

Code

#include <type_traits>

template<class T>
inline T logicalShift(T t1, T t2) {
  return 
    static_cast<
      typename std::make_unsigned<T>::type
    >(t1) >> t2;
}

Analysis of Assembler code

Packed it into a f(T x, T y) { return logicalShift(x, y); } results in following Assembler instructions (GCC9.2, -O3):

f(int, int):
        mov     eax, edi
        mov     ecx, esi
        shr     eax, cl
        ret
f(unsigned int, unsigned int):
        mov     eax, edi
        mov     ecx, esi
        shr     eax, cl
        ret
f(__int128, __int128):
        mov     rcx, rdx
        mov     rax, rdi
        mov     rdx, rsi
        shrd    rax, rsi, cl
        shr     rdx, cl
        xor     esi, esi
        and     ecx, 64
        cmovne  rax, rdx
        cmovne  rdx, rsi
        ret
f(unsigned __int128, unsigned __int128):
        mov     rcx, rdx
        mov     rax, rdi
        mov     rdx, rsi
        shrd    rax, rsi, cl
        shr     rdx, cl
        xor     esi, esi
        and     ecx, 64
        cmovne  rax, rdx
        cmovne  rdx, rsi
        ret

while a T a, b; T c = a >> b; results in:

f(int, int):
        mov     eax, edi      # 32-bit registers
        mov     ecx, esi
        sar     eax, cl       # lower 8-bit of cx register
        ret
f(long, long):
        mov     rax, rdi      # 64-bit registers
        mov     ecx, esi
        sar     rax, cl
        ret
f(unsigned int, unsigned int):
        mov     eax, edi
        mov     ecx, esi
        shr     eax, cl
        ret
f(__int128, __int128):
        mov     rcx, rdx
        mov     rdx, rsi
        mov     rax, rdi
        sar     rdx, cl
        shrd    rax, rsi, cl
        mov     rsi, rdx
        sar     rsi, 63
        and     ecx, 64
        cmovne  rax, rdx
        cmovne  rdx, rsi
        ret

We see, difference is mainly only shr instead of sar (and some more for __int128). What code could be faster?


No-Cast Versions

(with a reduced instruction set to ~ & ^ | + << >>)

The Problem - Shift Left (SAL, SHL)

The original idea of @Fingolfin is quite fine. But our processor won't do what we aspect on our first mind of int mask = ~0 << n for n >= 32;, but why?

The C++ Standard (draft N4713, 8.5.7, 2nd) says for <<:

The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.

Sound like (E1 << E2) % (UINTxx_MAX + 1), where we just fill from right with 0 and cut off the leading bits with the modulo operation. Simple and clear.

Analyze Shift Left

Assembler codes (GCC 9.2, -O3) for 16-bit short, 32-bit int and 64-bit long are:

g(short, short):
        movsx   eax, di    # 16-bit to 32-bit register
        mov     ecx, esi
        sal     eax, cl    # 1st is 32-bit, 2nd is 8-bit register
        ret
g(int, int):
        mov     eax, edi   # 32-bit registers
        mov     ecx, esi
        sal     eax, cl    # 1st is 32-bit, 2nd is 8-bit register
        ret
g(long, long):
        mov     rax, rdi   # 64-bit registers
        mov     ecx, esi
        sal     rax, cl    # 1st is 64-bit, 2nd is 8-bit register
        ret

So, we discussed what we assert from ~0 << i for int i = 0; i <= 33; i++, but what do we really get?

 0: 11111111111111111111111111111111
 1: 11111111111111111111111111111110
 2: 11111111111111111111111111111100
 3: 11111111111111111111111111111000
 4: 11111111111111111111111111110000
 5: 11111111111111111111111111100000
 6: 11111111111111111111111111000000
 7: 11111111111111111111111110000000
 8: 11111111111111111111111100000000
 9: 11111111111111111111111000000000
10: 11111111111111111111110000000000
11: 11111111111111111111100000000000
12: 11111111111111111111000000000000
13: 11111111111111111110000000000000
14: 11111111111111111100000000000000
15: 11111111111111111000000000000000
16: 11111111111111110000000000000000
17: 11111111111111100000000000000000
18: 11111111111111000000000000000000
19: 11111111111110000000000000000000
20: 11111111111100000000000000000000
21: 11111111111000000000000000000000
22: 11111111110000000000000000000000
23: 11111111100000000000000000000000
24: 11111111000000000000000000000000
25: 11111110000000000000000000000000
26: 11111100000000000000000000000000
27: 11111000000000000000000000000000
28: 11110000000000000000000000000000
29: 11100000000000000000000000000000
30: 11000000000000000000000000000000
31: 10000000000000000000000000000000
32: 11111111111111111111111111111111
33: 11111111111111111111111111111110

We see, the result is more like ~0 << (i%2^5).

So, look at the long (long aka. int64_t) case: (compiled with MSVC for x86)

 0: 1111111111111111111111111111111111111111111111111111111111111111
 1: 1111111111111111111111111111111111111111111111111111111111111110
 2: 1111111111111111111111111111111111111111111111111111111111111100
 3: 1111111111111111111111111111111111111111111111111111111111111000
 4: 1111111111111111111111111111111111111111111111111111111111110000
 5: 1111111111111111111111111111111111111111111111111111111111100000
 6: 1111111111111111111111111111111111111111111111111111111111000000
 7: 1111111111111111111111111111111111111111111111111111111110000000
 8: 1111111111111111111111111111111111111111111111111111111100000000
 9: 1111111111111111111111111111111111111111111111111111111000000000
10: 1111111111111111111111111111111111111111111111111111110000000000
11: 1111111111111111111111111111111111111111111111111111100000000000
12: 1111111111111111111111111111111111111111111111111111000000000000
13: 1111111111111111111111111111111111111111111111111110000000000000
14: 1111111111111111111111111111111111111111111111111100000000000000
15: 1111111111111111111111111111111111111111111111111000000000000000
16: 1111111111111111111111111111111111111111111111110000000000000000
17: 1111111111111111111111111111111111111111111111100000000000000000
18: 1111111111111111111111111111111111111111111111000000000000000000
19: 1111111111111111111111111111111111111111111110000000000000000000
20: 1111111111111111111111111111111111111111111100000000000000000000
21: 1111111111111111111111111111111111111111111000000000000000000000
22: 1111111111111111111111111111111111111111110000000000000000000000
23: 1111111111111111111111111111111111111111100000000000000000000000
24: 1111111111111111111111111111111111111111000000000000000000000000
25: 1111111111111111111111111111111111111110000000000000000000000000
26: 1111111111111111111111111111111111111100000000000000000000000000
27: 1111111111111111111111111111111111111000000000000000000000000000
28: 1111111111111111111111111111111111110000000000000000000000000000
29: 1111111111111111111111111111111111100000000000000000000000000000
30: 1111111111111111111111111111111111000000000000000000000000000000
31: 1111111111111111111111111111111110000000000000000000000000000000
32: 1111111111111111111111111111111100000000000000000000000000000000
33: 1111111111111111111111111111111000000000000000000000000000000000
34: 1111111111111111111111111111110000000000000000000000000000000000
35: 1111111111111111111111111111100000000000000000000000000000000000
36: 1111111111111111111111111111000000000000000000000000000000000000
37: 1111111111111111111111111110000000000000000000000000000000000000
38: 1111111111111111111111111100000000000000000000000000000000000000
39: 1111111111111111111111111000000000000000000000000000000000000000
40: 1111111111111111111111110000000000000000000000000000000000000000
41: 1111111111111111111111100000000000000000000000000000000000000000
42: 1111111111111111111111000000000000000000000000000000000000000000
43: 1111111111111111111110000000000000000000000000000000000000000000
44: 1111111111111111111100000000000000000000000000000000000000000000
45: 1111111111111111111000000000000000000000000000000000000000000000
46: 1111111111111111110000000000000000000000000000000000000000000000
47: 1111111111111111100000000000000000000000000000000000000000000000
48: 1111111111111111000000000000000000000000000000000000000000000000
49: 1111111111111110000000000000000000000000000000000000000000000000
50: 1111111111111100000000000000000000000000000000000000000000000000
51: 1111111111111000000000000000000000000000000000000000000000000000
52: 1111111111110000000000000000000000000000000000000000000000000000
53: 1111111111100000000000000000000000000000000000000000000000000000
54: 1111111111000000000000000000000000000000000000000000000000000000
55: 1111111110000000000000000000000000000000000000000000000000000000
56: 1111111100000000000000000000000000000000000000000000000000000000
57: 1111111000000000000000000000000000000000000000000000000000000000
58: 1111110000000000000000000000000000000000000000000000000000000000
59: 1111100000000000000000000000000000000000000000000000000000000000
60: 1111000000000000000000000000000000000000000000000000000000000000
61: 1110000000000000000000000000000000000000000000000000000000000000
62: 1100000000000000000000000000000000000000000000000000000000000000
63: 1000000000000000000000000000000000000000000000000000000000000000
64: 0000000000000000000000000000000000000000000000000000000000000000
65: 0000000000000000000000000000000000000000000000000000000000000000

Boom!

(same till 31 holds for short in GCC too, cause it uses 32-bit EAX register for sal)

But, this results are only created by the compiler:

x86 msvc v19.22, /O2:
_x$ = 8                                       ; size = 8
_y$ = 16                                                ; size = 8
__int64 g(__int64,__int64) PROC                                  ; g, COMDAT
        mov     eax, DWORD PTR _x$[esp-4]
        mov     edx, DWORD PTR _x$[esp]
        mov     ecx, DWORD PTR _y$[esp-4]
        jmp     __allshl
__int64 g(__int64,__int64) ENDP  
x64 msvc v19.22, /O2:
x$ = 8
y$ = 16
__int64 g(__int64,__int64) PROC                                  ; g, COMDAT
        mov     rax, rcx
        mov     rcx, rdx
        shl     rax, cl
        ret     0
__int64 g(__int64,__int64) ENDP      

And the x64 MSVC code shows the same behavior like the GCC 9.2 code - with shl instead of sal.

From that point, we know now that the processor itself (Intel Core 6th Gen.) use only the last digits of the cl register, depending on the length of the first register for the shift operation, even if the C++ standard says something else.

A small fix

So, this is, where the code breaks. Instinctively I would take a shiftAmount of 32 - n and get into the upper problem, you already avoided this by using a shiftAmount of 31 - n. With knowledge that n is 0..31, there is none shiftAmount of 32. Fine.

But, decreasing on one side means increasing on the other. The mask now need to start at -2 (we do not shift 0b1111, we shift 0b1110):

int logSh3(int x, int n) {
    int mask = ~2 + 1;
    int shiftAmount = 31 + ((~n) + 1);//this evaluates to 31 - n on two's complement machines
    mask = mask << shiftAmount;
    mask = ~mask;//If n equals 0, it means we have negated all bits and hence have mask = 0
    x = x >> n;
    return x & mask;
}

And it works!

Alternate code and analysis

Fingolfins code (fixed)

The upper code, as Assembler (GCC 9.2, -O3):

logSh3(int, int):
        mov     ecx, 31
        mov     edx, -2
        mov     eax, edi
        sub     ecx, esi
        sal     edx, cl
        mov     ecx, esi
        not     edx
        sar     eax, cl
        and     eax, edx
        ret

9 instructions

harolds code

int logSh2(int x, int n) {
    int shiftAmount = 31 + ((~n) + 1);//this evaluates to 31 - n on two's complement machines
    int mask = 1 << shiftAmount;
    mask |= mask + ((~1) + 1);
    x = x >> n;
    return x & mask;
}
logSh2(int, int):
        mov     ecx, esi
        mov     r8d, edi
        mov     edi, -2147483648
        shr     edi, cl
        sar     r8d, cl
        lea     eax, [rdi-1]
        or      eax, edi
        and     eax, r8d
        ret

8 instructions

Can we do better?

Another solutions

Instead of a left shift, we could do a right shift of 0b1000, shift it back by one and inverse.

int logSh4(int x, int n) {
    int mask = 0x80000000;
    mask = mask >> n;
    mask = mask << 1;
    mask = ~mask;//If n equals 0, it means we have negated all bits and hence have mask = 0
    x = x >> n;
    return x & mask;
}
logSh4(int, int):
        mov     ecx, esi
        mov     edx, -2147483648
        sar     edx, cl
        sar     edi, cl
        lea     eax, [rdx+rdx]
        not     eax
        and     eax, edi
        ret

7 instruction

Better way?

Let's shift 0b0111 to the right, shift it one back to left and add 1. So we spare us the inverse:

int logSh5(int x, int n) {
    int mask = 0x7fffffff;
    mask = mask >> n;
    mask = (mask << 1) | 1;
    x = x >> n;
    return x & mask;
}
logSh5(int, int):
        mov     ecx, esi
        mov     eax, 2147483647
        sar     eax, cl
        sar     edi, cl
        lea     eax, [rax+1+rax]
        and     eax, edi
        ret

6 instructions left. Fine. (But a simple cast is still the best solution in practice)

like image 28
codewerfer Avatar answered Oct 06 '22 19:10

codewerfer