I'm trying to find the efficiency of a program which I have recently posted on stackoverflow.
How to efficiently delete elements from a vector given an another vector
To compare the efficiency of my code with other answers I'm using chrono
object.
Is it a correct way to check the runtime efficiency?
If not then kindly suggest a way to do it with an example.
Code on Coliru
#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <ctime>
using namespace std;
void remove_elements(vector<int>& vDestination, const vector<int>& vSource)
{
if(!vDestination.empty() && !vSource.empty())
{
for(auto i: vSource) {
vDestination.erase(std::remove(vDestination.begin(), vDestination.end(), i), vDestination.end());
}
}
}
int main() {
vector<int> v1={1,2,3};
vector<int> v2={4,5,6};
vector<int> v3={1,2,3,4,5,6,7,8,9};
std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
remove_elements(v3,v1);
remove_elements(v3,v2);
std::chrono::steady_clock::time_point end= std::chrono::steady_clock::now();
std::cout << "Time difference = " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() <<std::endl;
for(auto i:v3)
cout << i << endl;
return 0;
}
Output
Time difference = 1472
7
8
9
Is it a correct way to check the runtime efficiency?
Looks like not the best way to do it. I see the following flaws in your method:
v3
fits CPU cache vs. v3
size exceeds CPU cache. Also consider cases when (v1.length() + v3.length()) * sizeof(int)
fits the cache or not, (v1.length() + v2.length() + v3.length()) * sizeof(int)
fits the cache or not and so on for all combinations.The biggest issues with your approach are:
1) The code you're testing is too short and predictable. You need to run it at least a few thousand times so that there is at least a few hundred milliseconds between measurements. And you need to make the data set larger and less predictable. In general, CPU caches really make accurate measurements based on synthetic input data a PITA.
2) The compiler is free to reorder your code. In general it's quite difficult to ensure the code you're timing will execute between calls to check the time (and nothing else, for that matter). On the one hand, you could dial down optimization, but on the other you want to measure optimized code.
One solution is to turn off whole-program optimization and put timing calls in another compilation unit.
Another possible solution is to use a memory fence around your test, e.g.
std::atomic_thread_fence(std::memory_order_seq_cst);
(requires #include <atomic>
and a C++11-capable compiler).
In addition, you might want to supplement your measurements with profiler data to see how efficiently L1/2/3 caches are used, memory bottlenecks, instruction retire rate, etc. Unfortunately the best tool for Intel x86 for that is commercial (vtune), but on AMD x86 a similar tool is free (codeXL).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With