Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MOVAPS accesses unaligned address

For some reason one of my functions is calling an SSE instruction movaps with unaligned parameter, which causes a crash. It happens on the first line of the function, the rest is needed to be there just for crash to happen, but is ommited for clarity.

Vec3f CrashFoo(
    const Vec3f &aVec3,
    const float  aFloat,
    const Vec2f &aVec2)
{
    const Vec3f vecNew =
        Normalize(Vec3f(aVec3.x, aVec3.x, std::max(aVec3.x, 0.0f)));

    // ...
}

This is how I call it from the debugging main:

int32_t main(int32_t argc, const char *argv[])
{
    Vec3f vec3{ 0.00628005248f, -0.999814332f, 0.0182171166f };
    Vec2f vec2{ 0.947231591f, 0.0522233732f };
    float floatVal{ 0.010f };

    Vec3f vecResult = CrashFoo(vec3, floatVal, vec2);

    return (int32_t)vecResult.x;
}

This is the disassembly from the beginning of the CrashFoo function to the line where it crashes:

00007FF7A7DC34F0  mov         rax,rsp  
00007FF7A7DC34F3  mov         qword ptr [rax+10h],rbx  
00007FF7A7DC34F7  push        rdi  
00007FF7A7DC34F8  sub         rsp,80h  
00007FF7A7DC34FF  movaps      xmmword ptr [rax-18h],xmm6  
00007FF7A7DC3503  movss       xmm6,dword ptr [rdx]  
00007FF7A7DC3507  movaps      xmmword ptr [rax-28h],xmm7  
00007FF7A7DC350B  mov         dword ptr [rax+18h],0  
00007FF7A7DC3512  mov         rdi,r9  
00007FF7A7DC3515  mov         rbx,rcx  
00007FF7A7DC3518  movaps      xmmword ptr [rax-38h],xmm8  
00007FF7A7DC351D  movaps      xmmword ptr [rax-48h],xmm9  
00007FF7A7DC3522  movaps      xmmword ptr [rax-58h],xmm10  
00007FF7A7DC3527  lea         rax,[rax+18h]  
00007FF7A7DC352B  xorps       xmm8,xmm8  
00007FF7A7DC352F  comiss      xmm8,xmm6  
00007FF7A7DC3533  movaps      xmmword ptr [rax-68h],xmm11  

My understanding is that it first does the usual function call stuff and then it starts preparing the playground by saving the current content of some SSE registers (xmm6-xmm11) onto the stack so that they are free to be used by the subsequent code. The xmm* registers are stored one after another to addresses from [rax-18h] to [rax-68h], which are nicely aligned to 16 bytes since rax=0xe4d987f788, but before the xmm11 register gets stored, the rax is increased by 18h which breaks the alignment causing crash. The xorps and comiss lines is where the actual code starts (std::max's comparison with 0). When I remove std::max it works nicely.

Do you see any reason for this behaviour?

Additional info

I uploaded a small compilable example that crashes for me in my Visual Studio, but not in the IDEone.

The code is compiled in Visual Studio 2013 Update 5 (x64 release, v120). I've set the "Struct Member Alignment" setting of the project to 16 bytes, but with little improvement and there are no packing pragma in the structures that I use. The error message is:

First-chance exception at 0x00007ff7a7dc3533 in PG3Render.exe: 0xC0000005: Access violation reading location 0xffffffffffffffff.

like image 377
ivokabel Avatar asked Oct 18 '22 14:10

ivokabel


1 Answers

gcc and clang are both fine, and make non-crashing non-vectorized code for your example. (Of course, I'm compiling for the Linux SysV ABI where none of the vector regs are caller-saved, so they weren't generating code to save xmm{6..15} on the stack in the first place.)

Your IDEone link doesn't demonstrate a crash either, so IDK. I there are online compile & run sites that have MSVC as an option. You can even get asm out of them if your program uses system to run a disassembler on itself. :P


The asm output you posted is guaranteed to crash, for any possible value of rax:

00007FF7A7DC3522  movaps      xmmword ptr [rax-58h],xmm10  
00007FF7A7DC3527  lea         rax,[rax+18h]  
...
00007FF7A7DC3533  movaps      xmmword ptr [rax-68h],xmm11

Accounting for the LEA, the second store address is [init_rax-50h], which is only 8B offset from the earlier stores. One or the other will fault. This appears to be a compiler bug that you should report.

I have no idea why your compiler would use lea instead of add rax, 18h. It does it right before clobbering the flags with a comiss

like image 71
Peter Cordes Avatar answered Nov 15 '22 13:11

Peter Cordes