Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimisation of a write out of a loop

Moving a member variable to a local variable reduces the number of writes in this loop despite the presence of the __restrict keyword. This is using GCC -O3. Clang and MSVC optimise the writes in both cases. [Note that since this question was posted we observed that adding __restrict to the calling function caused GCC to also move the store out of the loop. See the godbolt link below and the comments]

class X
{
public:
    void process(float * __restrict d, int size)
    {
        for (int i = 0; i < size; ++i)
        {
            d[i] = v * c + d[i];
            v = d[i];
        }
    }

    void processFaster(float * __restrict d, int size)
    {
        float lv = v;
        for (int i = 0; i < size; ++i)
        {
            d[i] = lv * c + d[i];
            lv = d[i];
        }
        v = lv;
    }

    float c{0.0f};
    float v{0.0f};
};

With gcc -O3 the first one has an inner loop that looks like:

.L3:
  mulss xmm0, xmm1
  add rdi, 4
  addss xmm0, DWORD PTR [rdi-4]
  movss DWORD PTR [rdi-4], xmm0
  cmp rax, rdi
  movss DWORD PTR x[rip+4], xmm0        ;<<< the extra store
  jne .L3
.L1:
  rep ret

The second here:

.L8:
  mulss xmm0, xmm1
  add rdi, 4
  addss xmm0, DWORD PTR [rdi-4]
  movss DWORD PTR [rdi-4], xmm0
  cmp rdi, rax
  jne .L8
.L7:
  movss DWORD PTR x[rip+4], xmm0
  ret

See https://godbolt.org/g/a9nCP2 for the complete code.

Why does the compiler not perform the lv optimisation here?

I'm assuming the 3 memory accesses per loop are worse than the 2 (assuming size is not a small number), though I've not measured this yet.

Am I right to make that assumption?

I think the observable behaviour should be the same in both cases.

like image 885
JCx Avatar asked Oct 23 '17 10:10

JCx


1 Answers

This seems to be caused by the missing __restrict qualifier on the f_original function. __restrict is a GCC extension; it is not quite clear how it is expected to behave in C++. Maybe it is a compiler bug (missed optimization) that it appears to disappear after inlining.

like image 123
Florian Weimer Avatar answered Oct 09 '22 09:10

Florian Weimer