Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ for loop and range-based loop performance

I read that range-based loops have better performance on some programming language. Is it the case in C++. For instance;

int main()
{
    vector<int> v = {1, 2, 3, 4, 5};

    auto size = v.size();
    // LOOP1
    for (int i = 0; i < size; i++) {
        // do something with v[i]
    }

    // LOOP2    
    for (int& val : v) {
        // do something with val 
    }

    return 0;
}

Does LOOP2 perform better than LOOP1 when vector size is huge? If so, why?

like image 656
user3858202 Avatar asked Jul 23 '14 17:07

user3858202


2 Answers

Here's a crude test. I'm not saying this is a definitive answer as to which is faster, but it seems to me in this particular case, the gcc compiler is able to optimize both loops to roughly the same performance level. You can definitely improve on the testing method, if you wish.

On my system (Ubuntu 14.04, some sort of i7, 8 GB DDR3, gcc):

Without optimization (g++ main.cpp -std=c++11):

Old-fashioned loop: 5.45131 seconds.

Range-based loop: 9.90306 seconds.

With optimization (g++ main.cpp -O3 -std=c++11):

Old-fashioned loop: 0.469001 seconds.

Range-based loop: 0.467045 seconds.

#include <iostream>
#include <vector>
#include <time.h>
using namespace std;

double time_elapsed(timespec& start, timespec& end)
{
    return ((1e9 * end.tv_sec + end.tv_nsec) - 
            (1e9 * start.tv_sec + start.tv_nsec)) / 1.0e9;
}

int main()
{
    vector<int> v(1e9, 42);

    timespec start, end;

//  Old-fashioned loop.
    clock_gettime(CLOCK_MONOTONIC_RAW, &start);
    size_t size = v.size();
    for (size_t i = 0; i < size; i++)
    {
        v[i] *= v[i];
    }
    clock_gettime(CLOCK_MONOTONIC_RAW, &end);

    cout << "Old-fashioned loop: " << time_elapsed(start, end) << " seconds\n";

//  Range-based loop.
    clock_gettime(CLOCK_MONOTONIC_RAW, &start); 
    for (int& val : v)
    {
        val *= val;
    }
    clock_gettime(CLOCK_MONOTONIC_RAW, &end);

    cout << "Range-based loop: " << time_elapsed(start, end) << " seconds.\n";
}
like image 66
MGA Avatar answered Sep 30 '22 01:09

MGA


I tested them both using the following code:

#include <benchmark/benchmark.h>
#include <vector>

auto get_vector(int size) {
    return std::vector<int>(size, 2333);
}
template <class F>
void BM_for_range(benchmark::State &state)
{
    for(auto _: state) {
        state.PauseTiming();

        F{}(get_vector(state.range(0)), state);//Run the test
    }
}
struct v1 {
    template <class T>
    void operator () (std::vector<T> v, benchmark::State &state) {
        state.ResumeTiming();

        auto size = v.size();
        decltype(size) i = 0;
        for (; i != size; i++) {
            v[i] *= v[i];
        }
    }
};
struct v2 {
    template <class T>
    void operator () (std::vector<T> v, benchmark::State &state) {
        state.ResumeTiming();

        for (int& val : v) {
            val *= val;
        }
    }
};

BENCHMARK_TEMPLATE(BM_for_range, v1)->Range(64, (1 << 30));
BENCHMARK_TEMPLATE(BM_for_range, v2)->Range(64, (1 << 30));
BENCHMARK_MAIN();

All the performance test are done on:

Debian 9.4, Linux version 4.9.0-6-amd64 ([email protected])(gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02)

Compiled using clang++-6.0 -std=c++17 -lbenchmark -lpthread -Ofast main.cc

And got the result from running these in bash:

sudo cpupower frequency-set --governor performance
./a.out

Result:

Run on (8 X 1600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 6144K (x1)
-------------------------------------------------------------------
Benchmark                            Time           CPU Iterations
-------------------------------------------------------------------
BM_for_range<v1>/64                929 ns        932 ns     763881
BM_for_range<v1>/512              1051 ns       1053 ns     662887
BM_for_range<v1>/4096             2195 ns       2199 ns     317788
BM_for_range<v1>/32768           12484 ns      12509 ns      57699
BM_for_range<v1>/262144         114788 ns     114784 ns       6277
BM_for_range<v1>/2097152       1270037 ns    1269506 ns        554
BM_for_range<v1>/16777216     16472508 ns   16468972 ns         43
BM_for_range<v1>/134217728   130165013 ns  130136049 ns          5
BM_for_range<v1>/1073741824  986169581 ns  986168129 ns          1
BM_for_range<v2>/64                925 ns        927 ns     730624
BM_for_range<v2>/512              1060 ns       1062 ns     654477
BM_for_range<v2>/4096             2197 ns       2201 ns     319264
BM_for_range<v2>/32768           12288 ns      12314 ns      53352
BM_for_range<v2>/262144         112196 ns     112191 ns       6220
BM_for_range<v2>/2097152       1268441 ns    1267732 ns        553
BM_for_range<v2>/16777216     16627461 ns   16623624 ns         42
BM_for_range<v2>/134217728   127552348 ns  127551465 ns          6
BM_for_range<v2>/1073741824  963751205 ns  963152928 ns          1
like image 42
JiaHao Xu Avatar answered Sep 29 '22 23:09

JiaHao Xu