Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC left shift overflow

Tags:

c

macos

gcc

The following little program is very awkward using GCC version 4.2.1 (Apple Inc. build 5664) on a Mac.

#include <stdio.h>

int main(){
        int x = 1 << 32;
        int y = 32;
        int z = 1 << y;
        printf("x:%d, z: %d\n", x, z);
}

The result is x:0, z: 1.
Any idea why the values of x and z are different?
Thanks a lot.

like image 358
Lucas Avatar asked Oct 06 '10 10:10

Lucas


People also ask

Can left shift overflow?

If a bit goes further left than the place of the most-significant digit, the bit is lost. This means that left shifting a number too far will result in overflow, where the number of bits that are saved by the data type are insufficient to represent the actual number.

How do you use bit Shift?

Logical bit shifting may be useful for multiplying or dividing unsigned integers by powers of two. For example, if the value "0001" or "1" is shifted left, it becomes "0010" or "2," shifted to the left again it becomes "0100," or "4." Shifting to the right has an opposite effect of dividing the value by two per shift.


3 Answers

Short answer: the Intel processor masks the shift count to 5 bits (maximum 31). In other words, the shift actually performed is 32 & 31, which is 0 (no change).

The same result appears using gcc on a Linux 32-bit PC.

I assembled a shorter version of this program because I was puzzled by why a left shift of 32 bits should result in a non-zero value at all:

int main(){
    int y = 32;
    unsigned int z = 1 << y;
    unsigned int k = 1;
    k <<= y;
    printf("z: %u, k: %u\n", z, k);
}

..using the command gcc -Wall -o a.s -S deleteme.c (comments are my own)

main:
leal    4(%esp), %ecx
andl    $-16, %esp
pushl   -4(%ecx)
pushl   %ebp
movl    %esp, %ebp
pushl   %ecx
subl    $36, %esp
movl    $32, -16(%ebp)  ; y = 32
movl    -16(%ebp), %ecx ; 32 in CX register
movl    $1, %eax        ; AX = 1
sall    %cl, %eax       ; AX <<= 32(32)
movl    %eax, -12(%ebp) ; z = AX
movl    $1, -8(%ebp)    ; k = 1
movl    -16(%ebp), %ecx ; CX = y = 32
sall    %cl, -8(%ebp)   ; k <<= CX(32)
movl    -8(%ebp), %eax  ; AX = k
movl    %eax, 8(%esp)
movl    -12(%ebp), %eax
movl    %eax, 4(%esp)
movl    $.LC0, (%esp)
call    printf
addl    $36, %esp
popl    %ecx
popl    %ebp
leal    -4(%ecx), %esp
ret

Ok so what does this mean? It's this instruction that puzzles me:

sall    %cl, -8(%ebp)   ; k <<= CX(32)

Clearly k is being shifted left by 32 bits.

You've got me - it's using the sall instruction which is an arithmetic shift. I don't know why rotating this by 32 results in the bit re-appearing in the initial position. My initial conjecture would be that the processor is optimised to perform this instruction in one clock cycle - which means that any shift by more than 31 would be regarded as a don't care. But I'm curious to find the answer to this because I would expect that the rotate should result in all bits falling off the left end of the data type.

I found a link to http://faydoc.tripod.com/cpu/sal.htm which explains that the shift count (in the CL register) is masked to 5 bits. This means that if you tried to shift by 32 bits the actual shift performed would be by zero bits (i.e. no change). There's the answer!

like image 66
PP. Avatar answered Oct 19 '22 16:10

PP.


If your ints are 32 bits or shorter, the behaviour is undefined ... and undefined behaviour cannot be explained.

The Standard says:

6.5.7/3 [...] If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.


You can check your int width bit size, for example with:

#include <limits.h>
#include <stdio.h>
int main(void) {
    printf("bits in an int: %d\n", CHAR_BIT * (int)sizeof (int));
    return 0;
}

And you can check your int width (there can be padding bits), for example with:

#include <limits.h>
#include <stdio.h>
int main(void) {
    int width = 0;
    int tmp = INT_MAX;
    while (tmp) {
        tmp >>= 1;
        width++;
    }
    printf("width of an int: %d\n", width + 1 /* for the sign bit */);
    return 0;
}

Standard 6.2.6.2/2: For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit

like image 28
pmg Avatar answered Oct 19 '22 16:10

pmg


The C99 standard says that the result of shifting a number by the width in bits (or more) of the operand is undefined. Why?

Well this allows compilers to create the most efficient code for a particular architecture. For instance, the i386 shift instruction uses a five bit wide field for the number of bits to shift a 32 bit operand by. The C99 standard allows the compiler to simply take the bottom five bits of the shift count and put them in the field. Clearly this means that a shift of 32 bits (= 100000 in binary) is therefore identical to a shift of 0 and the result will therefore be the left operand unchanged.

A different CPU architecture might use a wider bit field, say 32 bits. The compiler can still put the shift count directly in the field but this time the result will be 0 because a shift of 32 bits will shift all the bits out of the left operand.

If the C99 defined one or other of these behaviours as correct, either the compiler for Intel has to put special checking in for shift counts that are too big or the compiler for non i386 has to mask the shift count.

The reason why

   int x = 1 << 32;

and

   int z = 1 << y;

give different results is because the first calculation is a constant expression and can be performed entirely by the compiler. The compiler must be calculating constant expressions by using 64 bit arithmetic. The second expression is calculated by the code generated by the compiler. Since the type of both y and z is int the code generates a calculation using 32 bit wide ints (int is 32 bits on both i386 and x86_64 with gcc on Apple).

like image 7
JeremyP Avatar answered Oct 19 '22 15:10

JeremyP