Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why is std::equal much slower than a hand rolled loop for two small std::array?

I was profiling a small piece of code that is part of a larger simulation, and to my surprise, the STL function equal (std::equal) is much slower than a simple for-loop, comparing the two arrays element by element. I wrote a small test case, which I believe to be a fair comparison between the two, and the difference, using g++ 6.1.1 from the Debian archives is not insignificant. I am comparing two, four-element arrays of signed integers. I tested std::equal, operator==, and a small for loop. I didn't use std::chrono for an exact timing, but the difference can be seen explicitly with time ./a.out.

My question is, given the sample code below, why does operator== and the overloaded function std::equal (which calls operator== I believe) take approx 40s to complete, and the hand written loop take only 8s? I'm using a very recent intel based laptop. The for-loop is faster on all optimizations levels, -O1, -O2, -O3, and -Ofast. I compiled the code with g++ -std=c++14 -Ofast -march=native -mtune=native

Run the code

The loop runs a huge number of times, just to make the difference clear to the naked eye. The modulo operators represent a cheap operation on one of the array elements, and serve to keep the compiler from optimizing out of the loop.

#include<iostream>
#include<algorithm>
#include<array>

using namespace std;
using T = array<int32_t, 4>;

bool 
are_equal_manual(const T& L, const T& R)
noexcept {
    bool test{ true };
    for(uint32_t i{0}; i < 4; ++i) { test = test && (L[i] == R[i]); }
    return test;
}

bool
are_equal_alg(const T& L, const T& R)
noexcept {
    bool test{ equal(cbegin(L),cend(L),cbegin(R)) };
    return test;
}

int main(int argc, char** argv) {

    T left{ {0,1,2,3} };
    T right{ {0,1,2,3} };

    cout << boolalpha << are_equal_manual(left,right) << endl;
    cout << boolalpha << are_equal_alg(left,right) << endl;
    cout << boolalpha << (left == right) << endl;

    bool t{};
    const size_t N{ 5000000000 };
    for(size_t i{}; i < N; ++i) {
      //t = left == right; // SLOW
      //t = are_equal_manual(left,right); // FAST
        t = are_equal_alg(left,right);  // SLOW
      left[0] = i % 10;
      right[2] = i % 8;
    }

    cout<< boolalpha << t << endl;

    return(EXIT_SUCCESS);
}
like image 619
KBentley57 Avatar asked Sep 01 '16 03:09

KBentley57


People also ask

Is std :: vector slower than array?

A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program. One reason for this myth to persist, are examples that compare raw arrays with mis-used std::vectors.

How much slower are vectors than arrays?

The time difference is 0.333 seconds. Or a difference of 0.000000000333 per array access. Assuming a 2.33 GHz Processor like mine that's 0.7 instruction pipeline stages per array accesses. So the vector looks like it is using one extra instruction per accesses.

Is array better than vector C++?

Array is memory efficient data structure. Vector takes more time in accessing elements. Array access elements in constant time irrespective of their location as elements are arranged in a contiguous memory allocation.

What is faster array or vector C++?

The conclusion is that arrays of integers are faster than vectors of integers (5 times in my example). However, arrays and vectors are arround the same speed for more complex / not aligned data. Save this answer.


1 Answers

Here's the generated assembly of the for loop in main() when the are_equal_manual(left,right) function is used:

.L21:
        xor     esi, esi
        test    eax, eax
        jne     .L20
        cmp     edx, 2
        sete    sil
.L20:
        mov     rax, rcx
        movzx   esi, sil
        mul     r8
        shr     rdx, 3
        lea     rax, [rdx+rdx*4]
        mov     edx, ecx
        add     rax, rax
        sub     edx, eax
        mov     eax, edx
        mov     edx, ecx
        add     rcx, 1
        and     edx, 7
        cmp     rcx, rdi

And here's what's generated when the are_equal_alg(left,right) function is used:

.L20:
        lea     rsi, [rsp+16]
        mov     edx, 16
        mov     rdi, rsp
        call    memcmp
        mov     ecx, eax
        mov     rax, rbx
        mov     rdi, rbx
        mul     r12
        shr     rdx, 3
        lea     rax, [rdx+rdx*4]
        add     rax, rax
        sub     rdi, rax
        mov     eax, ebx
        add     rbx, 1
        and     eax, 7
        cmp     rbx, rbp
        mov     DWORD PTR [rsp], edi
        mov     DWORD PTR [rsp+24], eax
        jne     .L20

I'm not exactly sure what's happening in the generated code for first case, but it's clearly not calling memcmp(). It doesn't appear to be comparing the contents of the arrays at all. While the loop is still being iterated 5000000000 times, it's optimized to doing nothing much. However, the loop that uses are_equal_alg(left,right) is still performing the comparison. Basically, the compiler is still able to optimize the manual comparison much better than the std::equal template.

like image 189
Michael Burr Avatar answered Sep 18 '22 19:09

Michael Burr