Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using std::atomic with aligned classes

Tags:

c++

c++11

sse

I have a mat4 class, a 4x4 matrix that uses sse intrinsics. This class is aligned using _MM_ALIGN16, because it stores the matrix as a set of __m128's. The problem is, when I declare an atomic<mat4>, my compiler yells at me:

f:\program files (x86)\microsoft visual studio 12.0\vc\include\atomic(504): error C2719: '_Val': formal parameter with __declspec(align('16')) won't be aligned

This is the same error I get when I try to pass any class aligned with _MM_ALIGN16 as an argument for a function (without using const &).

How can I declare an atomic version of my mat4 class?

like image 278
Haydn V. Harach Avatar asked Feb 13 '14 00:02

Haydn V. Harach


3 Answers

The MSC compiler has never supported more than 4 bytes of alignment for parameters on the x86 stack, and there is no workaround.

You can verify this yourself by compiling,

struct A { __declspec(align(4)) int x; }; 
void foo(A a) {}                      

versus,

// won't compile, alignment guarantee can't be fulfilled
struct A { __declspec(align(8)) int x; };

versus,

// __m128d is naturally aligned, again - won't compile
struct A { __m128d x; };

Generally MSC is absolved by the following,

You cannot specify alignment for function parameters.

align (C++)

And you cannot specify the alignment, because MSC writers wanted to reserve the freedom to decide on the alignment,

The x86 compiler uses a different method for aligning the stack. By default, the stack is 4-byte aligned. Although this is space efficient, you can see that there are some data types that need to be 8-byte aligned, and that, in order to get good performance, 16-byte alignment is sometimes needed. The compiler can determine, on some occasions, that dynamic 8-byte stack alignment would be beneficial—notably when there are double values on the stack.

The compiler does this in two ways. First, the compiler can use link-time code generation (LTCG), when specified by the user at compile and link time, to generate the call-tree for the complete program. With this, it can determine regions of the call-tree where 8-byte stack alignment would be beneficial, and it determines call-sites where the dynamic stack alignment gets the best payoff. The second way is used when the function has doubles on the stack, but, for whatever reason, has not yet been 8-byte aligned. The compiler applies a heuristic (which improves with each iteration of the compiler) to determine whether the function should be dynamically 8-byte aligned.

Windows Data Alignment on IPF, x86, and x64

Thus as long as you use MSC with the 32-bit platform toolset, this issue is unavoidable.

The x64 ABI has been explicit about the alignment, defining that non-trivial structures or structures over certain sizes are passed as a pointer parameter. This is elaborated in Section 3.2.3 of the ABI, and MSC had to implement this to be compatible with the ABI.

Path 1: Use another Windows compiler toolchain: GCC or ICC.

Path 2: Move to a 64-bit platform MSC toolset

Path 3: Reduce your use cases to std::atomic<T> with T=__m128d, because it will be possible to skip the stack and pass the variable in an XMM register directly.

like image 58
mockinterface Avatar answered Nov 19 '22 17:11

mockinterface


The atomic<T> probably has a constructor which is passed a copy of T as a (formal) parameter. For example in the atomic header packaged with GCC 4.5 :

97: atomic(_Tp __i) : _M_i(__i) { }

This is problematic for exactly the same reason as any other function which has a memory aligned type as a parameter: It would be very complicated and slow for functions to keep track of memory aligned data on the stack.

Even if the compiler allowed it, this approach would incur a significant performance penalty. Assuming you are trying to optimise for speed I would implement a less fine grained memory access approach. Either locking access to a chunk of memory whilst performing a series of calculations, or explicitly designing your program so that threads never try and access the same piece of memory.

like image 38
andypea Avatar answered Nov 19 '22 17:11

andypea


I faced a similar problem using Agner Fog's vectorclass in MSVC. The problem happens in 32-bit mode. If you compile in 64-bit mode release mode I don't think you will have this problem. In Windows and Unix all variables on the stack are aligned to 16 bytes in 64-bit mode but not necessarily in 32-bit mode. In his manual under compile time errors he writes

"error C2719: formal parameter with __declspec(align('16')) won't be aligned". The Microsoft compiler cannot handle vectors as function parameters. The easiest solution is to change the parameter to a const reference, e.g.: Vec4f my_function(Vec4f const & x) { ... }

So if you use a const reference (as you mentioned) when you pass your class to a function it should work in 32-bit mode as well.

Edit: Based on this Self-contained, STL-compatible implementation of std::vector I think you can use a "thin wrapper". Something like.

template <typename T>
struct wrapper : public T
{
    wrapper() {}
    wrapper(const T& rhs) : T(rhs) {}
};

struct __declspec(align(64)) mat4
{
    //float x, y, z, w;
};

int main()
{
    atomic< wrapper<mat4> > m;  // OK, no C2719 error
    return 0;
}
like image 1
Z boson Avatar answered Nov 19 '22 16:11

Z boson