Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this a clang optimizer bug or an undefined behavior in C?

Tags:

This code gives different results for -O1 and -O2:

/*     Example of a clang optimization bug.     Mark Adler, August 8, 2015.      Using -O0 or -O1 takes a little while and gives the correct result:          47 bits set (4294967296 loops)      Using -O2 or -O3 optimizes out the loop, returning immediately with:          0 bits set (4294967296 loops)      Of course, there weren't really that many loops.  The number of loops was     calculated, correctly, by the compiler when optimizing.  But it got the     number of bits set wrong.      This is with:          Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)         Target: x86_64-apple-darwin14.4.0   */  #include <stdio.h> #include <inttypes.h>  /* bit vector of 1<<32 bits, initialized to all zeros */ static uint64_t vec[1 << 26] = {0};  int main(void) {     /* set 47 of the bits. */     vec[31415927] = UINT64_C(0xb9fe2f2fedf7ebbd);      /* count the set bits */     uint64_t count = 0;     uint64_t loops = 0;     uint32_t x = 0;     do {         if (vec[x >> 6] & ((uint64_t)1 << (x & 0x3f)))             count++;         x++;         loops++;     } while (x);     printf("%" PRIu64 " bits set (%" PRIu64 " loops)\n", count, loops);     return 0; } 

So is this a bug? Or is there somehow an undefined behavior in there, that the compiler is within its rights to give different results for?

As far as I can tell from the C99 standard, the do loop to go through all uint32_t values is valid since the increment of the largest unsigned integer value is well defined to result in zero.

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

like image 900
Mark Adler Avatar asked Aug 08 '15 19:08

Mark Adler


2 Answers

I'm reasonably sure this is a bug in clang. I see no undefined behavior in the program (assuming that it doesn't exceed the implementation's capacity limits) -- other than a small problem in the printf call that I'll address below (and that has now been addressed in an edit to the question). It's possible that I've missed something, but I don't think so.

If I have missed something, I expect it to be pointed out shortly. If this answer remains uncontradicted after a few days, I'll take it as a strong indication that it is indeed a bug in clang.

UPDATE : Mark Adler, the original poster, has reported this and confirmed that it's a bug in pre-3.6.0 clang, corrected in later versions. I will shamelessly steal this link to the bug report from his answer.

The correct output is:

47 bits set (4294967296 loops) 

To address some of the things that have been pointed out (or that I've noticed myself):

static uint64_t vec[1 << 26] = {0}; 

This is a large object (229 bytes, or half a gigabyte, assuming CHAR_BIT==8), but it apparently does not exceed the implementation's capacity. If it did, it would be rejected. I'm not 100% sure the standard requires this, but since the program does work correctly at lower optimization levels, we can assume that the object is not too large.

vec[31415927] = 0xb9fe2f2fedf7ebbd 

The constant 0xb9fe2f2fedf7ebbd is not a problem. Its value is between 263 and 264, so it's within the range of uint64_t. The type of a hexadecimal integer constant is wide enough to hold its value (unless it exceeds ULLONG_MAX, but that's not the case here).

if (vec[x >> 6] & ((uint64_t)1 << (x & 0x3f))) 

I briefly thought that the left shift might be a problem, but it isn't. The left operand is of type uint64_t, and the right operand is in the range 0 .. 63. A left shift of 64 bits would have undefined behavior, but that's not the case here.

printf("%llu bits set (%llu loops)\n", count, loops); 

The following has been addressed by an update to the question. I've tried the updated version of the code and I got the same results.

%llu requires an argument of type unsigned long long; count and loops are of type uint64_t. Here, depending on the implementation, we might have undefined behavior (on my system uint64_t is a typedef for unsigned long, and I get a warning). But it's not likely to cause any actual problems (unsigned long long and uint64_t typically have the same representation even if they're not the same type), and when I add casts to avoid any UB:

printf("%llu bits set (%llu loops)\n",        (unsigned long long)count,        (unsigned long long)loops); 

I get the same behavior. The following results are for a program with casts added to the printf call.

Using gcc 5.2.0 on my 64-bit system, I get the correct output with -O0, -O1, -O2, and -O3, with or without -m32. The timing indicates that gcc does not eliminate the loop at any optimization level.

Using clang 3.4 on the same system, I get the correct output with -O0 or -O1, but incorrect output (0 bits set) at -O2 or -O3. Timing indicates that the loop is eliminated at -O2 and -O3. When I compile with clang -m32, the output is correct (and the loop is not eliminated) at all optimization levels.

When I change the declaration of loops to

volatile uint64_t loops = 0; 

I get correct output at all optimization levels (and the loop is not eliminated).

A further tweak to the program (not shown here) shows that vec[31415927] really is set to 0xb9fe2f2fedf7ebbd, even when optimization produces the wrong bit count.

like image 171
Keith Thompson Avatar answered Sep 28 '22 05:09

Keith Thompson


It is a bug in pre-3.6.0 clang. (The "3.6.0svn" version precedes 3.6.0.) Since it has already been fixed in the 3.6.0 release as of five months ago, I have reported the bug to Apple -- this is still their most recent distribution of compiler tools.

like image 38
Mark Adler Avatar answered Sep 28 '22 07:09

Mark Adler