Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find runtime efficiency of a C++ code

I'm trying to find the efficiency of a program which I have recently posted on stackoverflow.

How to efficiently delete elements from a vector given an another vector

To compare the efficiency of my code with other answers I'm using chrono object.

Is it a correct way to check the runtime efficiency?

If not then kindly suggest a way to do it with an example.

Code on Coliru

#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <ctime>
using namespace std;

void remove_elements(vector<int>& vDestination, const vector<int>& vSource) 
{
    if(!vDestination.empty() && !vSource.empty())
    {
        for(auto i: vSource) {
            vDestination.erase(std::remove(vDestination.begin(), vDestination.end(), i), vDestination.end());
        }
    }
}

int main() {
    vector<int> v1={1,2,3};
    vector<int> v2={4,5,6};
    vector<int> v3={1,2,3,4,5,6,7,8,9};
    std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
    remove_elements(v3,v1);
    remove_elements(v3,v2);
    std::chrono::steady_clock::time_point end= std::chrono::steady_clock::now();
    std::cout << "Time difference = " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() <<std::endl;
    for(auto i:v3)
        cout << i << endl;
    return 0;
}

Output

Time difference = 1472
7
8
9
like image 572
user3898160 Avatar asked Aug 25 '16 09:08

user3898160


Video Answer


2 Answers

Is it a correct way to check the runtime efficiency?

Looks like not the best way to do it. I see the following flaws in your method:

  1. Values are sorted. Branch prediction may expose ridiculous effects when testing sorted vs. unsorted values with the same algorithm. Possible fix: test on both sorted and unsorted and compare results.
  2. Values are hard-coded. CPU cache is a tricky thing and it may introduce subtle differences between tests on hard-coded values and real-life ones. In a real world you are unlikely to perform these operations on hard-coded values, so you may either read them from a file or generate random ones.
  3. Too few values. Your code's execution time is much smaller than the timer precision.
  4. You run the code only once. If you fix all other issues and run the code twice, the second run may be much faster than the first one due to cache warm-up: subsequent runs tend to have less cache misses than the first one.
  5. You run the code once on fixed-size data. It would be better to run otherwise correct tests at least four times to cover a Cartesian product of the following parameters:
    • sorted vs. unsorted data
    • v3 fits CPU cache vs. v3 size exceeds CPU cache. Also consider cases when (v1.length() + v3.length()) * sizeof(int) fits the cache or not, (v1.length() + v2.length() + v3.length()) * sizeof(int) fits the cache or not and so on for all combinations.
like image 153
Sergey Avatar answered Oct 21 '22 14:10

Sergey


The biggest issues with your approach are:

1) The code you're testing is too short and predictable. You need to run it at least a few thousand times so that there is at least a few hundred milliseconds between measurements. And you need to make the data set larger and less predictable. In general, CPU caches really make accurate measurements based on synthetic input data a PITA.

2) The compiler is free to reorder your code. In general it's quite difficult to ensure the code you're timing will execute between calls to check the time (and nothing else, for that matter). On the one hand, you could dial down optimization, but on the other you want to measure optimized code.

One solution is to turn off whole-program optimization and put timing calls in another compilation unit.

Another possible solution is to use a memory fence around your test, e.g.

    std::atomic_thread_fence(std::memory_order_seq_cst);

(requires #include <atomic> and a C++11-capable compiler).

In addition, you might want to supplement your measurements with profiler data to see how efficiently L1/2/3 caches are used, memory bottlenecks, instruction retire rate, etc. Unfortunately the best tool for Intel x86 for that is commercial (vtune), but on AMD x86 a similar tool is free (codeXL).

like image 26
rustyx Avatar answered Oct 21 '22 14:10

rustyx