Why is std::shuffle as slow (or even slower than) std::sort?

Question

Consider the simple code that measures execution time and the number of swaps performed:

#include <iostream>

#include <vector>
#include <random>
#include <chrono>
#include <algorithm>

struct A {
    A(int i = 0) : i(i) {}
    int i;
    static int nSwaps;

    friend void swap(A& l, A& r)
    {
        ++nSwaps;
        std::swap(l.i, r.i);
    }

    bool operator<(const A& r) const
    {
        return i < r.i;
    }
};

int A::nSwaps = 0;

using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::milliseconds;


int main()
{
    std::vector<A> v(10000000);

    std::minstd_rand gen(std::random_device{}());
    std::generate(v.begin(), v.end(), [&gen]() {return gen();});

    auto s = high_resolution_clock::now();
    std::sort(v.begin(), v.end());
    std::cout << duration_cast<milliseconds>(high_resolution_clock::now() - s).count() 
        << "ms with " << A::nSwaps << " swaps
";

    A::nSwaps = 0;
    s = high_resolution_clock::now();
    std::shuffle(v.begin(), v.end(), gen);
    std::cout << duration_cast<milliseconds>(high_resolution_clock::now() - s).count() 
        << "ms with " << A::nSwaps << " swaps
";
}

The output of the program depends on the compiler and the machine, but they are quite similar in their nature. On my laptop with VS2015, I get 1044ms with ~100 million swaps for sort and 824ms with 10 million swaps for shuffle.

libstdc++ and libc++ do twice as few swaps for sort (~50M) and the results are as follows. Rextester gives me similar results: gcc sort 854ms, shuffle 565ms, clang sort 874ms, shuffle 648ms. The results shown by ideone and coliru are even more drastic: ideone sort 1181ms, shuffle 1292ms and coliru sort 1157ms, shuffle 1461ms.

So what's the culprit here? Why with 5 to 10 times more swaps sort is almost as fast or even faster than a simple shuffle? I'm not even taking into account comparisons and more complex logic in std::sort including choosing insertion, heap or quick sort algorithms, etc. I doubt it's the random engine - I've even chosen the simplest one std::minstd_rand which basically does an integer multiplication and a modulo. Is it the cache misses that make shuffle relatively slow?

PS: the behaviour is the same for simple std::vector<int>

stgatilov · Accepted Answer

std::random_shuffle usually works as follows:

//random(k) generates uniform random from 0 to k-1 inclusive
for (int i = 1; i < n; i++)
  swap(arr[i], arr[random(i + 1)]);

So we can see two sources of inefficiency here:

Random number generators are often quite slow.
Each swap uses a totally random element from the vector. When the data size is large, the whole vector does not fit into CPU cache, so each such access has to wait until the data is read from RAM.

Speaking of point 2, sorting algorithms like quicksort are much more cache-friendly: most of their memory accesses hit cache.

Why is std::shuffle as slow (or even slower than) std::sort?

Tags:

c++

performance

sorting

c++11

shuffle

Rostislav

1 Answers

stgatilov

Recent Activity

Donate For Us

Why is std::shuffle as slow (or even slower than) std::sort?

Tags:

c++

performance

sorting

c++11

shuffle

Rostislav

1 Answers

stgatilov

Related questions

Recent Activity

Donate For Us