For some reason one of my functions is calling an SSE instruction movaps
with unaligned parameter, which causes a crash. It happens on the first line of the function, the rest is needed to be there just for crash to happen, but is ommited for clarity.
Vec3f CrashFoo(
const Vec3f &aVec3,
const float aFloat,
const Vec2f &aVec2)
{
const Vec3f vecNew =
Normalize(Vec3f(aVec3.x, aVec3.x, std::max(aVec3.x, 0.0f)));
// ...
}
This is how I call it from the debugging main:
int32_t main(int32_t argc, const char *argv[])
{
Vec3f vec3{ 0.00628005248f, -0.999814332f, 0.0182171166f };
Vec2f vec2{ 0.947231591f, 0.0522233732f };
float floatVal{ 0.010f };
Vec3f vecResult = CrashFoo(vec3, floatVal, vec2);
return (int32_t)vecResult.x;
}
This is the disassembly from the beginning of the CrashFoo
function to the line where it crashes:
00007FF7A7DC34F0 mov rax,rsp
00007FF7A7DC34F3 mov qword ptr [rax+10h],rbx
00007FF7A7DC34F7 push rdi
00007FF7A7DC34F8 sub rsp,80h
00007FF7A7DC34FF movaps xmmword ptr [rax-18h],xmm6
00007FF7A7DC3503 movss xmm6,dword ptr [rdx]
00007FF7A7DC3507 movaps xmmword ptr [rax-28h],xmm7
00007FF7A7DC350B mov dword ptr [rax+18h],0
00007FF7A7DC3512 mov rdi,r9
00007FF7A7DC3515 mov rbx,rcx
00007FF7A7DC3518 movaps xmmword ptr [rax-38h],xmm8
00007FF7A7DC351D movaps xmmword ptr [rax-48h],xmm9
00007FF7A7DC3522 movaps xmmword ptr [rax-58h],xmm10
00007FF7A7DC3527 lea rax,[rax+18h]
00007FF7A7DC352B xorps xmm8,xmm8
00007FF7A7DC352F comiss xmm8,xmm6
00007FF7A7DC3533 movaps xmmword ptr [rax-68h],xmm11
My understanding is that it first does the usual function call stuff and then it starts preparing the playground by saving the current content of some SSE registers (xmm6
-xmm11
) onto the stack so that they are free to be used by the subsequent code. The xmm*
registers are stored one after another to addresses from [rax-18h]
to [rax-68h]
, which are nicely aligned to 16 bytes since rax=0xe4d987f788
, but before the xmm11
register gets stored, the rax
is increased by 18h which breaks the alignment causing crash. The xorps
and comiss
lines is where the actual code starts (std::max
's comparison with 0). When I remove std::max
it works nicely.
Do you see any reason for this behaviour?
I uploaded a small compilable example that crashes for me in my Visual Studio, but not in the IDEone.
The code is compiled in Visual Studio 2013 Update 5 (x64 release, v120). I've set the "Struct Member Alignment" setting of the project to 16 bytes, but with little improvement and there are no packing pragma
in the structures that I use. The error message is:
First-chance exception at 0x00007ff7a7dc3533 in PG3Render.exe: 0xC0000005: Access violation reading location 0xffffffffffffffff.
gcc and clang are both fine, and make non-crashing non-vectorized code for your example. (Of course, I'm compiling for the Linux SysV ABI where none of the vector regs are caller-saved, so they weren't generating code to save xmm{6..15} on the stack in the first place.)
Your IDEone link doesn't demonstrate a crash either, so IDK. I there are online compile & run sites that have MSVC as an option. You can even get asm out of them if your program uses system
to run a disassembler on itself. :P
The asm output you posted is guaranteed to crash, for any possible value of rax
:
00007FF7A7DC3522 movaps xmmword ptr [rax-58h],xmm10
00007FF7A7DC3527 lea rax,[rax+18h]
...
00007FF7A7DC3533 movaps xmmword ptr [rax-68h],xmm11
Accounting for the LEA, the second store address is [init_rax-50h]
, which is only 8B offset from the earlier stores. One or the other will fault. This appears to be a compiler bug that you should report.
I have no idea why your compiler would use lea
instead of add rax, 18h
. It does it right before clobbering the flags with a comiss
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With