Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wrapper for `__m256` Producing Segmentation Fault with Constructor - Windows 64 + MinGW + AVX Issues

I have a union that looks like this

 union bareVec8f { 
    __m256 m256; //avx 8x float vector
    float floats[8];
    int ints[8];
    inline bareVec8f(){
    }
    inline bareVec8f(__m256 vec){
        this->m256 = vec;
    }
    inline bareVec8f &operator=(__m256 m256) {
        this->m256 = m256;
        return *this;
    }

    inline operator __m256 &() {
        return m256;
    }
}

the __m256 needs to be aligned on 32 byte boundary to be used with SSE functions, and should be automatically, even within the union.

And when I do this

bareVec8f test = _mm256_set1_ps(1.0f);

I get a segmentation fault. This code should work because of the constructor I made. However, when I do this

bareVec8f test;
test.m256 = _mm256_set1_ps(8.f);

I do not get a segmentation fault.

So because that works fine the union is probably aligned properly, there's just some segmentation fault being caused with the constructor it seems

I'm using gcc 64bit windows compiler

---------------------------------EDIT Matt managed to produce the simplest example of the error that seems to be happening here.

#include <immintrin.h>

void foo(__m256 x) {}

int main()
{
    __m256 r = _mm256_set1_ps(0.0f);
    foo(r);
}

I'm compiling with -std=c++11 -mavx

like image 732
Thomas Avatar asked Jun 18 '15 21:06

Thomas


1 Answers

This is a bug in g++ for Windows. It does not perform 32-byte stack alignment when it should. Bug 49001 Bug 54412


On this SO thread someone made a Python script to process the assembly output by g++ to fix the problem, so that would be one option.

Otherwise, to avoid this in your union you could make the functions which take __m256 by value, take it by reference instead. This shouldn't have any performance penalty unless optimization is low/off.

In case you are unaware - union aliasing causes undefined behaviour in C++, it's not permitted to write m256 and then read floats or ints for example. So perhaps there is a different solution to your problem.

like image 164
M.M Avatar answered Oct 02 '22 19:10

M.M