Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LLVM optimisation bug or undefined behaviour?

Tags:

c

llvm

clang

While compiling a larger project with clang I stumbled upon an irritating bug.

Consider the following small example:

unsigned long int * * fee();

void foo( unsigned long int q )
{
  unsigned long int i,j,k,e;
  unsigned long int pows[7];
  unsigned long int * * table;

  e = 0;
  for (i = 1; i <= 256; i *= q)
    pows[e++] = i;
  pows[e--] = i; 

  table = fee();  // need to set table to something unknown
                  // here, otherwise the compiler optimises
                  // parts of the loops below away
                  // (and no bug occurs)

  for (i = 0; i < q; i++)
    for (j = 0; j < e; j++)
      ((unsigned char*)(*table) + 5 )[i*e + j] = 0;   // bug here
}

To the best of my knowledge this code does not violate the C standard in any way, although the last line seems awkward (in the actual project, code like this appears due to excessive use of preprocessor macros).

Compiling this with clang (version 3.1 or higher) at optimisation level -O1 or higher results in code writing to the wrong position in memory.

The crucial parts of the assembly file produced by clang/LLVM read as follows: (This is GAS syntax, so to those of you who are used to Intel: Beware!)

    [...]
    callq   _fee
    leaq    6(%rbx), %r8          ## at this point, %rbx == e-1
    xorl    %edx, %edx
LBB0_4:
    [...]
    movq    %r8, %rsi
    imulq   %rdx, %rsi
    incq    %rdx
LBB0_6:
    movq    (%rax), %rcx          ## %rax == fee()
    movb    $0, (%rcx,%rsi)
    incq    %rsi
    [conditional jumps back to LBB0_6 resp. LBB0_4]
    [...]

In other words, the instructions do

(*table)[i*(e+5) + j] = 0;

instead of the last line written above. The choice of + 5 is arbitrary, adding (or subtracting) other integers results in the same behaviour. So - is this a bug in LLVM's optimisation or is there undefined behaviour going on here?

Edit: Note also that the bug disappears if I leave out the cast (unsigned char*) in the last line. In general, the bug appears to be quite sensitive to any changes.

like image 360
m_l Avatar asked Mar 07 '13 19:03

m_l


People also ask

Does LLVM optimize?

LLVM currently does not optimize well loads and stores of large aggregate types (i.e. structs and arrays). As an alternative, consider loading individual fields from memory. Aggregates that are smaller than the largest (performant) load or store instruction supported by the targeted hardware are well supported.

What is LLVM optimization?

DESCRIPTION. The opt command is the modular LLVM optimizer and analyzer. It takes LLVM source files as input, runs the specified optimizations or analyses on it, and then outputs the optimized file.

What is LLVM GCC?

DESCRIPTION. The llvm-gcc command is the LLVM C front end. It is a modified version of gcc that compiles C/ObjC programs into native objects, LLVM bitcode or LLVM assembly language, depending upon the options. By default, llvm-gcc compiles to native objects just like GCC does.

What is LLVM in compiler design?

LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture.


1 Answers

I am quite sure this is an optimizer bug. It repro's in LLVM-2.7 and LLVM-3.1, the only versions I have access to.

I posted a bug to the LLVM Bugzilla.

The bug is demonstrated by this SSCCE:

#include <stdio.h>

unsigned long int * table;

void foo( unsigned long int q )
{
  unsigned long int i,j,e;

  e = 0;
  for (i = 1; i <= 256; i *= q)
    e++;
  e--;

  for (i = 0; i < q; i++)
    for (j = 0; j < e; j++)
      ((unsigned char*)(table) + 13 )[i*e + j] = 0;   // bug here
}

int main() {
    unsigned long int v[8];
    int i;
    memset(v, 1, sizeof(v));

    table = v;
    foo(2);

    for(i=0; i<sizeof(v); i++) {
        printf("%d", ((unsigned char*)v)[i]);
    }
    puts("");
    return 0;
}

It should print

1111111111111000000000000000011111111111111111111111111111111111

under GCC and "clang -O0". The incorrect output observed with LLVM is

0000000011111111111110000000011111111111111111111111111111111111

Thanks for noticing this!

like image 93
nneonneo Avatar answered Sep 21 '22 05:09

nneonneo