Consider the simple code that measures execution time and the number of swaps performed:
#include <iostream>
#include <vector>
#include <random>
#include <chrono>
#include <algorithm>
struct A {
A(int i = 0) : i(i) {}
int i;
static int nSwaps;
friend void swap(A& l, A& r)
{
++nSwaps;
std::swap(l.i, r.i);
}
bool operator<(const A& r) const
{
return i < r.i;
}
};
int A::nSwaps = 0;
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::milliseconds;
int main()
{
std::vector<A> v(10000000);
std::minstd_rand gen(std::random_device{}());
std::generate(v.begin(), v.end(), [&gen]() {return gen();});
auto s = high_resolution_clock::now();
std::sort(v.begin(), v.end());
std::cout << duration_cast<milliseconds>(high_resolution_clock::now() - s).count()
<< "ms with " << A::nSwaps << " swaps\n";
A::nSwaps = 0;
s = high_resolution_clock::now();
std::shuffle(v.begin(), v.end(), gen);
std::cout << duration_cast<milliseconds>(high_resolution_clock::now() - s).count()
<< "ms with " << A::nSwaps << " swaps\n";
}
The output of the program depends on the compiler and the machine, but they are quite similar in their nature. On my laptop with VS2015, I get 1044ms with ~100 million swaps for sort and 824ms with 10 million swaps for shuffle.
libstdc++ and libc++ do twice as few swaps for sort (~50M) and the results are as follows. Rextester gives me similar results: gcc sort 854ms, shuffle 565ms, clang sort 874ms, shuffle 648ms. The results shown by ideone and coliru are even more drastic: ideone sort 1181ms, shuffle 1292ms and coliru sort 1157ms, shuffle 1461ms.
So what's the culprit here? Why with 5 to 10 times more swaps sort is almost as fast or even faster than a simple shuffle? I'm not even taking into account comparisons and more complex logic in std::sort
including choosing insertion, heap or quick sort algorithms, etc. I doubt it's the random engine - I've even chosen the simplest one std::minstd_rand
which basically does an integer multiplication and a modulo. Is it the cache misses that make shuffle relatively slow?
PS: the behaviour is the same for simple std::vector<int>
std::random_shuffle
usually works as follows:
//random(k) generates uniform random from 0 to k-1 inclusive
for (int i = 1; i < n; i++)
swap(arr[i], arr[random(i + 1)]);
So we can see two sources of inefficiency here:
Speaking of point 2, sorting algorithms like quicksort are much more cache-friendly: most of their memory accesses hit cache.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With