Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are STL algorithms optimized for speed?

Tags:

c++

algorithm

stl

I was testing the speed of different ways loop on a std::vector. In the code below, I consider 5 ways to calculate the sum of all elements of a vector of N = 10000000 elements:

  1. using iterators
  2. using integer indices
  3. using integer indices, unrolling by a factor 2
  4. using integer indices, unrolling by a factor 4
  5. using std::accumulate

The code is compiled with g++ for windows, the command line used to compile is:

g++ -std=c++11 -O3 loop.cpp -o loop.exe

I ran the code 4 times, measuring the time of each method, I get the following results (time in microseconds, max and min are given):

  • Iterators: 8002 - 8007
  • Int indices: 8004 - 9003
  • Unroll 2: 6004 - 7005
  • Unroll 4: 4001 - 5004
  • accumulate: 8005 - 9007

What these experiments seem to indicate is:

  1. Looping with iterators vs integer indices does not make much difference, at least with full optimization.

  2. Unrolling the loop pays off

  3. Surprisingly, the stl::accumulate gives the worse performance.

While the conclusions 1 and 2 were somewat expected, the number 3 is quite surprising. Don't all books say to use the STL algorithms instead of writing loops by myself?

Am I making any mistake in the way I am measuring the time, or in the way I interprete the results? Do you guys get a different scenario in case you try out the code given below?

#include <iostream>
#include <chrono>
#include <vector>
#include <numeric>

using namespace std;
using namespace std::chrono;



int main()
{
    const int N = 10000000;
    vector<int> v(N);
    for (int i = 0; i<N; ++i)
        v[i] = i;

    //looping with iterators
    {
        high_resolution_clock::time_point t1 = high_resolution_clock::now();

        long long int sum = 0;
        for (auto it = v.begin(); it != v.end(); ++it)
            sum+=*it;

        high_resolution_clock::time_point t2 = high_resolution_clock::now();

        auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

        cout << duration << "microseconds  output = " << sum << " (Iterators)\n";
    }

    //looping with integers
    {
        high_resolution_clock::time_point t1 = high_resolution_clock::now();

        long long int sum = 0;
        for (int i = 0; i<N; ++i)
            sum+=v[i];

        high_resolution_clock::time_point t2 = high_resolution_clock::now();

        auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

        cout << duration << "microseconds  output = " << sum << " (integer index)\n";
    }

    //looping with integers (UNROLL 2)
    {
        high_resolution_clock::time_point t1 = high_resolution_clock::now();

        long long int sum = 0;
        for (int i = 0; i<N; i+=2)
            sum+=v[i]+v[i+1];

        high_resolution_clock::time_point t2 = high_resolution_clock::now();

        auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

        cout << duration << "microseconds  output = " << sum << " (integer index, UNROLL 2)\n";
    }

    //looping with integers (UNROLL 4)
    {
        high_resolution_clock::time_point t1 = high_resolution_clock::now();

        long long int sum = 0;
        for (int i = 0; i<N; i+=4)
            sum+=v[i]+v[i+1]+v[i+2]+v[i+3];

        high_resolution_clock::time_point t2 = high_resolution_clock::now();

        auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

        cout << duration << "microseconds  output = " << sum << " (integer index, UNROLL 4)\n";
    }

    //using std::accumulate
    {
        high_resolution_clock::time_point t1 = high_resolution_clock::now();

        long long int sum = accumulate(v.begin(), v.end(), static_cast<long long int>(0));

        high_resolution_clock::time_point t2 = high_resolution_clock::now();

        auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

        cout << duration << "microseconds  output = " << sum << " (std::accumulate)\n";
    }
    return 0;
}
like image 990
Giuseppe Avatar asked Mar 17 '15 23:03

Giuseppe


People also ask

Is STL fast?

STL is usually faster at runtime than either C-style solutions with callback pointers or polymorphism-based solutions with virtual methods (see also this Bjarne Stroustrup's keynote).

How STL algorithm is different from conventional algorithm?

How STL algorithms are different from the conventional algorithms? Algorithms are functions that can be used generally across a variety of containers for processing their contents. Conventional algorithms are very lengthy and complex, where STL algorithms are very easy and short that can save a lot of time and effort.

Is std :: Find faster than a for loop?

Looping Performance in C++ Today I was testing the performance of a piece of code, which is basically accessing each element in a container within a for loop. But the result is quite shocking because I found the std::for_each version is 10 times faster than the raw loop.

Is transform faster than for loop?

List comprehension with a separate transform() function is around 17% slower than the initial "for loop"-based version (224/191≈1.173). But it's much more readable, so I prefer it over the other solutions.


1 Answers

The reason for using the standard library algorithms is not to get better efficiency, it is to allow you to think at a higher level of abstraction.

While there might be some cases where the algorithm will be faster than your own hand-rolled code, that's not what they're there for. One of the great advantages of C++ is that it allows you to bypass the built-in libraries when you have a specific need. If your benchmarking has shown that the standard library is causing a critical slowdown, you are free to explore classic alternatives such as loop unrolling. For most purposes that will never be necessary.

With that said, a well written standard library algorithm will never be horribly slower than your own straight-forward implementation, unless you're taking advantage of knowledge of the specifics of your data.

like image 167
Mark Ransom Avatar answered Oct 15 '22 01:10

Mark Ransom