Possible Duplicate:
Qt, GCC, SSE and stack alignment
I am converting a simulator from TinyPTC to WxWidgets. Some graphics routines are optimized with SSE intrinsics. During the initialization of the GUI, the initial state is rendered once, and all of the SSE routines work perfectly. However, if I call them later from an event handler, I get a SIGSEGV.
At first I thought those were some weird alignment issues, but it even happens for:
__m128i zero = _mm_setzero_si128();
When I replace the SSE routines with non-optimized code, everything works fine.
I suppose the event handling happens in a different thread than the initialization. Is there anything to watch out for when using SSE from different threads? What else could possibly cause this behavior?
The SIGSEGV happens at a movdqa %xmm0, -40(%ebp)
instruction (there are several of those). If I compile with -O1
, the movdqa
instructions are completely optimized away, and the program runs fine. It seems to be an alignment issue with the stack after all, as already pointed out in the comments.
Here is the command CodeLite generates for compilation:
g++ -c "x:/some/folder/sse.cpp" -g -O1 -Wall -std=gnu++0x -msse3
-mthreads -DHAVE_W32API_H -D__WXMSW__ -D__WXDEBUG__ -D_UNICODE
-ID:\CodeLite\wxWidgets\lib\gcc_dll\mswud -ID:\CodeLite\wxWidgets\include
-DWXUSINGDLL -Wno-ctor-dtor-privacy -pipe -fmessage-length=0 -o ./Debug/sse.o -I.
Anything unusual? Is it possible that WxWidgets changes the alignment settings somewhere?
Your stack pointer is probably misaligned. The SSE instructions require that all memory locations are 16-byte aligned. The issue isn't occurring with the _mm_setzero_si128
instruction, which just loads a constant into an SSE register, but rather the instruction that the compiler generated to store that register back into memory on the stack.
First make sure you're not using an outdated version of GCC (older versions had issues with stack alignment with SSE). Then, try also adding the -mstackrealign
option for that translation unit, which will forcibly realign the stack to 16-byte alignment on function entry (which adds a very tiny runtime cost).
See Volume 2B page 4-67 of the Intel Architectures Software Developer Manuals for more details on the movdqa
instruction and the exact conditions under which it can generate exceptions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With