Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Segmentation fault using OpenMp and SSE

Tags:

c

gcc

sse

openmp

I'm just getting started experimenting adding OpenMP to some SSE code.

My first test program SOMETIMES crashes in _mm_set_ps, but works when I set the if (0).

It looks so simple I must be missing something obvious. I'm compiling with gcc -fopenmp -g -march=core2 -pthreads

  #include <stdio.h>
  #include <stdlib.h>
  #include <immintrin.h>

  int main()
  {
  #pragma omp parallel if (1)
   {
  #pragma omp sections
       {
  #pragma omp section
           {
              __m128 x1 = _mm_set_ps ( 1.1f, 2.1f, 3.1f, 4.1f );
           }
  #pragma omp section
           {
              __m128 x2 = _mm_set_ps ( 1.2f, 2.2f, 3.2f, 4.2f );
           }
       } // end omp sections
   } //end omp parallel

  return 0;
  }
like image 248
Ian Shaw Avatar asked Jul 16 '11 10:07

Ian Shaw


2 Answers

This is a bug in the openMP implementation. I was having the same problem in gcc on Windows (MinGW). -mstackrealign command line option solved my problem. This adds an instruction to the prolog of every function to realign the stack at the 16-byte boundary. I didn't notice any performance penalty. You can also try to add __attribute__ ((force_align_arg_pointer)) to a function declaration, which should do the same, but only for a specific function. You might have to put the SSE code in a separate function that you then call from the function with #pragma omp, so that the stack has a chance to be realigned.

I stopped having the problem when I moved onto compiling for a 64-bit target (MinGW64, such as TDM GCC build).

I am playing with AVX instructions which require a 32-byte alignment, but GCC doesn't support that for windows at all. This forced me to fix the produced assembly code using a python script, but it works.

like image 139
Norbert P. Avatar answered Sep 28 '22 07:09

Norbert P.


I smell unaligned memory access. Its the only way code like that could explode(assuming that is the only code there). For that to happen the XMM registers wouldn't be used but rather stack memory, which is only aligned to 4 bytes, my guess is the omp code is messing up the alignment of the stack.

like image 32
Necrolis Avatar answered Sep 28 '22 08:09

Necrolis