Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

g++ handling of copying a std::complex

Tags:

c++

gcc

assembly

As part of a self-education project I looked into how g++ handles std::complex - type and was puzzled by this simple function:

#include <complex>  
std::complex<double> c;

void get(std::complex<double> &res){
    res=c;
}

Compiled with g++-6.3 -O3 (or also -Os) for Linux64 I got this result:

    movsd   c(%rip), %xmm0
    movsd   %xmm0, (%rdi)
    movsd   c+8(%rip), %xmm0
    movsd   %xmm0, 8(%rdi)
    ret

So it moves the real and imaginary parts individually as 64bit floats. However, I would expect the assembly to use two movups instead of four movsd, i.e. moving the real and imaginary parts simultaneously as a 128bit package:

    movups  c(%rip), %xmm0
    movups  %xmm0, (%rdi)
    ret

This is not only twice as fast on my machine (Intel Broadwell) as the movsd-version, but also needs only 16 bytes while the movsd-version needs 36 bytes.

What is the reason for the g++ creating an assembly with movsd?

  1. There is an additional compiler flag to trigger the usage of movups which I should use next to -O3?
  2. There are disadvantages of using movups I'm not aware of?
  3. g++ does not produce optimal assembly here?
  4. something else?

More context: I try to compare two possible function signatures:

std::complex<double> get(){
    return c;
}

and

void get(std::complex<double> &res){
    res=c;
}

The first version has to put the real part and the imaginary part into different registers (xmm0 and xmm1) because of the SystemV ABI. But with the second version one could try to take some advantages of the SSE-operations which works on 128bits, however it does not work with my g++-version.


Edit: As kennytm's answer suggest, the g++ seems to produce non-optimal assembly. It always uses 4 movsd for copying an std::complex from one memory location to another, as for example in

void get(std::complex<double> *res){
    res[1]=res[0];
}

There is now a bug-report filed to gcc-bugzilla..

like image 894
ead Avatar asked Apr 08 '17 17:04

ead


1 Answers

3. g++ does not produce optimal assembly here.

Both clang and icc use only one SSE register. You can check the compiled code in https://godbolt.org/g/55lPv0.

get(std::complex<double>&):
        movups    c(%rip), %xmm0
        movups    %xmm0, (%rdi)  
        ret
like image 129
kennytm Avatar answered Sep 28 '22 08:09

kennytm