I'm seeing behaviour I don't expect when compiling this code with different optimization levels in gcc.
The function test should fill a 64 bit unsigned integer with ones, shift them shift_size bits to the left, and return the 32 low bits as a 32 bit unsigned integer.
When I compile with -O0 I get the results I expect.
When I compile with -O2 I do not, if I try to shift by 32 bits or more.
In fact, I get exactly the results I'd expect if I were shifting a 32 bit integer by shifts greater than or equal to bit width on x86, which is a shift using only the 5 low bits of the shift size.
But I'm shifting a 64 bit number, so shifts < 64 should be legal right?
I assume it's a bug in my understanding and not in the compiler, but I haven't been able to figure it out.
My machine: gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5 i686-linux-gnu
#include <stdint.h>
#include <stdio.h>
#include <inttypes.h>
uint32_t test(unsigned int shift_size) {
uint64_t res = 0;
res = ~res;
res = res << shift_size; //Shift size < uint64_t width so this should work
return res; //Implicit cast to uint32_t
}
int main(int argc, char *argv[])
{
int dst;
sscanf(argv[1], "%d", &dst); //Get arg from outside so optimizer doesn't eat everything
printf("%" PRIu32 "l\n", test(dst));
return 0;
}
Usage:
$ gcc -Wall -O0 test.c
$ ./a.out 32
0l
$ gcc -Wall -O2 test.c
$ ./a.out 32
4294967295l
gcc -S -Wall -O0 test.c
gcc -S -Wall -O2 test.c
-O1 (optimize minimally) -O2 (optimize more) -O3 (optimize even more) -Ofast (optimize very aggressively to the point of breaking standard compliance)
Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.
The degree to which the compiler will optimize the code it generates is controlled by the -O flag. No optimization. In the absence of any version of the -O flag, the compiler generates straightforward code with no instruction reordering or other attempt at performance improvement. -O or -O2.
"%u"
(or "%lu"
) and uint32_t
are not necessarily compatible.
Try
#include <inttypes.h>
//printf("%ul\n", test(dst));
printf("%" PRIu32 "l\n", test(dst));
Printing a uint32_t
value with a "%u"
(or "%lu"
) specifier can invoke Undefined Behaviour.
I was able to repro this. Here's the relevant bit of generated code with -O2
:
movl $-1, %eax
movl $-1, %edx
sall %cl, %eax
xorl %edx, %edx
testb $32, %cl
cmovne %eax, %edx
cmovne %edx, %eax ; This appears to be the instruction in error.
; It looks as though gcc thought that %edx might still
; be zero at this point, because if the shift count is
; >= 32 then %eax should be zero after this.
and here is the equivalent bit with -O0
:
movl -16(%ebp), %eax
movl -12(%ebp), %edx
shldl %cl,%eax, %edx
sall %cl, %eax
testb $32, %cl
je L3
movl %eax, %edx
xorl %eax, %eax ; correctly zeros %eax if shift count >= 32
L3:
movl %eax, -16(%ebp)
movl %edx, -12(%ebp)
Compiler is:
i686-apple-darwin11-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
Thanks for posting your gcc -S
output. I had a look and while it's slightly different, the critical part has the same error as what I saw on my machine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With