Compilation fails with OpenMP on Mac OS X Lion (memcpy and SSE intrinsics)

Tags:

I have stumbled upon the following problem. The below code snippet does not link on Mac OS X with any Xcode I tried (4.4, 4.5)

#include <stdlib.h>
#include <string.h>
#include <emmintrin.h>

int main(int argc, char *argv[])
{
  char *temp;
#pragma omp parallel
  {
    __m128d v_a, v_ar;
    memcpy(temp, argv[0], 10);
    v_ar = _mm_shuffle_pd(v_a, v_a, _MM_SHUFFLE2 (0,1));
  }
}

The code is just provided as an example and would segfault when you run it. The point is that it does not compile. The compilation is done using the following line

/Applications/Xcode.app/Contents/Developer/usr/bin/gcc test.c -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.7.sdk -mmacosx-version-min=10.7 -fopenmp

 Undefined symbols for architecture x86_64:
"___builtin_ia32_shufpd", referenced from:
    _main.omp_fn.0 in ccJM7RAw.o
"___builtin_object_size", referenced from:
    _main.omp_fn.0 in ccJM7RAw.o
ld: symbol(s) not found for architecture x86_64
collect2: ld returned 1 exit status

The code compiles just fine when not using the -fopenmp flag to gcc. Now, I googled around and found a solution for the first problem connected with memcpy, which is adding -fno-builtin, or -D_FORTIFY_SOURCE=0 to gcc arguments list. I did not manage to solve the second problem (sse intrinsic).

Can anyone help me to solve this? The questions:

most importantly: how to get rid of the "___builtin_ia32_shufpd" error?
what exactly is the reason for the memcpy problem, and what does the -D_FORTIFY_SOURCE=0 flag eventually do?

238

asked Oct 17 '12 10:10

angainor

1 Answers

This is a bug in the way Apple's LLVM-backed GCC (llvm-gcc) transforms OpenMP regions and handles calls to the built-ins inside them. The problem can be diagnosed by examining the intermediate tree dumps (obtainable by passing -fdump-tree-all argument to gcc). Without OpenMP enabled the following final code representation is generated (from the test.c.016t.fap):

main (argc, argv)
{
  D.6544 = __builtin_object_size (temp, 0);
  D.6545 = __builtin_object_size (temp, 0);
  D.6547 = __builtin___memcpy_chk (temp, D.6546, 10, D.6545);
  D.6550 = __builtin_ia32_shufpd (v_a, v_a, 1);
}

This is a C-like representation of how the compiler sees the code internally after all transformations. This is what is then gets turned into assembly instructions. (only those lines that refer to the built-ins are shown here)

With OpenMP enabled the parallel region is extracted into own function, main.omp_fn.0:

main.omp_fn.0 (.omp_data_i)
{
  void * (*<T4f6>) (void *, const <unnamed type> *, long unsigned int, long unsigned int) __builtin___memcpy_chk.21;
  long unsigned int (*<T4f5>) (const <unnamed type> *, int) __builtin_object_size.20;
  vector double (*<T6b5>) (vector double, vector double, int) __builtin_ia32_shufpd.23;
  long unsigned int (*<T4f5>) (const <unnamed type> *, int) __builtin_object_size.19;

  __builtin_object_size.19 = __builtin_object_size;
  D.6587 = __builtin_object_size.19 (D.6603, 0);
  __builtin_ia32_shufpd.23 = __builtin_ia32_shufpd;
  D.6593 = __builtin_ia32_shufpd.23 (v_a, v_a, 1);
  __builtin_object_size.20 = __builtin_object_size;
  D.6588 = __builtin_object_size.20 (D.6605, 0);
  __builtin___memcpy_chk.21 = __builtin___memcpy_chk;
  D.6590 = __builtin___memcpy_chk.21 (D.6609, D.6589, 10, D.6588);
}

Again I have only left the code that refers to the builtins. What is apparent (but the reason for that is not immediately apparent to me) is that the OpenMP code trasnformer really insists on calling all the built-ins through function pointers. These pointer asignments:

__builtin_object_size.19 = __builtin_object_size;
__builtin_ia32_shufpd.23 = __builtin_ia32_shufpd;
__builtin_object_size.20 = __builtin_object_size;
__builtin___memcpy_chk.21 = __builtin___memcpy_chk;

generate external references to symbols which are not really symbols but rather names that get special treatment by the compiler. The linker then tries to resolve them but is unable to find any of the __builtin_* names in any of the object files that the code is linked against. This is also observable in the assembly code that one can obtain by passing -S to gcc:

LBB2_1:
    movapd  -48(%rbp), %xmm0
    movl    $1, %eax
    movaps  %xmm0, -80(%rbp)
    movaps  -80(%rbp), %xmm1
    movl    %eax, %edi
    callq   ___builtin_ia32_shufpd
    movapd  %xmm0, -32(%rbp)

This basically is a function call that takes 3 arguments: one integer in %eax and two XMM arguments in %xmm0 and %xmm1, with the result being returned in %xmm0 (as per the SysV AMD64 ABI function calling convention). In contrast, the code generated without -fopenmp is an instruction-level expansion of the intrinsic as it is supposed to happen:

LBB1_3:
    movapd  -64(%rbp), %xmm0
    shufpd  $1, %xmm0, %xmm0
    movapd  %xmm0, -80(%rbp)

What happens when you pass -D_FORTIFY_SOURCE=0 is that memcpy is not replaced by the "fortified" checking version and a regular call to memcpy is used instead. This eliminates the references to object_size and __memcpy_chk but cannot remove the call to the ia32_shufpd built-in.

This is obviously a compiler bug. If you really really really must use Apple's GCC to compile the code, then an interim solution would be to move the offending code to an external function as the bug apparently only affects code that gets extracted from parallel regions:

void func(char *temp, char *argv0)
{
   __m128d v_a, v_ar;
   memcpy(temp, argv0, 10);
   v_ar = _mm_shuffle_pd(v_a, v_a, _MM_SHUFFLE2 (0,1));
}

int main(int argc, char *argv[])
{
  char *temp;
#pragma omp parallel
  {
    func(temp, argv[0]);
  }
}

The overhead of one additional function call is neglegible compared to the overhead of entering and exiting the parallel region. You can use OpenMP pragmas inside func - they will work because of the dynamic scoping of the parallel region.

May be Apple would provide a fixed compiler in the future, may they won't, given their commitment to replacing GCC with Clang.

123

answered Sep 29 '22 10:09

Hristo Iliev

Related questions
                            
                                How to measure FLOPS
                            
                                Declaring an array inside a class, and setting its size with the constructor
                            
                                Is there a way to automatically add extensions to a file using QFileDialog on Linux
                            
                                Why objects cannot be created without using class-keyword?
                            
                                OpenGL 3: glBindVertexArray invalidates GL_ELEMENT_ARRAY_BUFFER
                            
                                Including C++ 11 headers with Clang / LLVM
                            
                                Forward declaring a typedef of an unnamed struct [duplicate]
                            
                                What are Critical sections in threads
                            
                                How to deal with Eclipse CDT+Cygwin?
                            
                                DirectX 11 framebuffer capture (C++, no Win32 or D3DX)
                            
                                Are methods of templated classes implied inline linkage?
                            
                                Can a copy-constructor take a non-const parameter?
                            
                                copying one pointer to other in C++
                            
                                Getting error 'char16_t and char32_t undeclared'
                            
                                Returning a const reference to a C-array?
                            
                                C++ Variadic template AND and OR
                            
                                CMake no longer finds static Boost libraries
                            
                                Can I tell the compiler to consider a control path closed with regards to return value?
                            
                                C++ Why do vector initialization calls the copy constructor
                            
                                template copy constructor

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compilation fails with OpenMP on Mac OS X Lion (memcpy and SSE intrinsics)

Tags:

c++

c

macos

openmp

fortify-source

angainor

People also ask

1 Answers

Hristo Iliev

Recent Activity

Donate For Us