gcc -O3 optimize :: xmm0 register?

Question

I was writing a vsprintf function to use my 64-bit OS kernel (written by C), and checked that it works well in Visual Studio and Cygwin gcc. Then, I put to my kernel and run... but kernel doesn't works well

I debugged and figured out the problem: vsprintf contains next assembly code

movdqa xmm0,XMMWORD PTR [rip+0x0]

The real problem is that I NEVER use floating point!

I guess that was gcc's optimization, and It seems to be correct because It works well without optimization.

Is there any solution, so to speak, gcc option that disable optimization with xmm registers?

kennytm · Accepted Answer

The XMM register move instructions are generated, because in the System V AMD64 ABI, floating point arguments are stored in XMM0–XMM7.

Since we don't know if floating points are used just by looking at the variadic function, the compiler needs to generate instructions to push the floating point values to the va_list as well.

You could use the -mno-sse flag to disable SSE. For example,

__attribute__((noinline))
void f(const char* x, ...) {
    va_list va;
    va_start(va, x);
    vprintf(x, va);
    va_end(va);
}

Without the -mno-sse flag:

subq    $0x000000d8,%rsp
testb   %al,%al
movq    %rsi,0x28(%rsp)
movq    %rdx,0x30(%rsp)
movq    %rcx,0x38(%rsp)
movq    %r8,0x40(%rsp)
movq    %r9,0x48(%rsp)
je  0x100000f1b
movaps  %xmm0,0x50(%rsp)
movaps  %xmm1,0x60(%rsp)
movaps  %xmm2,0x70(%rsp)
movaps  %xmm3,0x00000080(%rsp)
movaps  %xmm4,0x00000090(%rsp)
movaps  %xmm5,0x000000a0(%rsp)
movaps  %xmm6,0x000000b0(%rsp)
movaps  %xmm7,0x000000c0(%rsp)
0x100000f1b:
leaq    0x000000e0(%rsp),%rax
movl    $0x00000008,0x08(%rsp)
movq    %rax,0x10(%rsp)
leaq    0x08(%rsp),%rsi
leaq    0x20(%rsp),%rax
movl    $0x00000030,0x0c(%rsp)
movq    %rax,0x18(%rsp)
callq   0x100000f6a ; symbol stub for: _vprintf
addq    $0x000000d8,%rsp
ret

With the -mno-sse flag:

subq    $0x58,%rsp
leaq    0x60(%rsp),%rax
movq    %rsi,0x28(%rsp)
movq    %rax,0x10(%rsp)
leaq    0x08(%rsp),%rsi
leaq    0x20(%rsp),%rax
movq    %rdx,0x30(%rsp)
movq    %rcx,0x38(%rsp)
movq    %r8,0x40(%rsp)
movq    %r9,0x48(%rsp)
movl    $0x00000008,0x08(%rsp)
movq    %rax,0x18(%rsp)
callq   0x100000f6a ; symbol stub for: _vprintf
addq    $0x58,%rsp
ret

You could also use the target attribute to disable SSE just for that function, e.g.

__attribute__((noinline, target("no-sse")))
//                       ^^^^^^^^^^^^^^^^
void f(const char* x, ...) {
    va_list va;
    va_start(va, x);
    vprintf(x, va);
    va_end(va);
}

But be warned that other functions with SSE support won't know f doesn't use SSE, and thus calling them with floating point numbers will cause undefined behavior:

int main() {
    f("%g %g", 1.0, 2.0);  // 1.0 and 2.0 are stored in XMM0–1
                           // So this will print garbage e.g. `0 6.95326e-310`
}

gcc -O3 optimize :: xmm0 register?

Tags:

c

optimization

gcc

sse

ikh

1 Answers

kennytm

Recent Activity

Donate For Us

gcc -O3 optimize :: xmm0 register?

Tags:

c

optimization

gcc

sse

ikh

1 Answers

kennytm

Related questions

Recent Activity

Donate For Us