Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How big is the performance gap between std::sort and std::stable_sort in practice?

Tags:

c++

sorting

stl

Both should run in O(n log n), but in general sort is faster than stable_sort. How big is the performance gap in practice? Do you have some experience about that?

I want to sort a very large number of structs that have a size of about 20 bytes. The stability of the result would be nice in my case, but it is not a must. At the moment the underlying container is a plain array, perhaps it could be changed to a std::deque later on.

like image 431
SebastianK Avatar asked May 01 '09 10:05

SebastianK


People also ask

What is the difference between sort and stable_sort?

Difference Between sort and stable_sort() Therefore, sort() may preserve the physical order of semantically equivalent values but can't be guaranteed. stable_sort() function usually uses mergesort. Therefore, stable_sort() preserve the physical order of semantically equivalent values and its guaranteed.

Is std::sort the fastest?

In general, std::sort is indeed faster than qsort because of a couple of these things: qsort operates on void* , which first requires a dereference, and second requires the size of the data type to perform the swaps.

What is the complexity of std::sort?

Sort. The std::sort is a sorting function that uses the Introsort algorithm and have the complexity of O(N log(N)) where N= std::distance(first, last) since C++11 and the order of equal elements is not guaranteed to be preserved[3].

Is stable sort slower?

Both should run in O(n log n), but in general sort is faster than stable_sort.


2 Answers

There are good answers that compared the algorithms theoretically. I benchmarked std::sort and std::stable_sort with google/benchmark for curiosity's sake.

It is useful to point out ahead of time that;

  • Benchmark machine has 1 X 2500 MHz CPU and 1 GB RAM
  • Benchmark OS Arch Linux 2015.08 x86-64
  • Benchmark compiled with g++ 5.3.0 and clang++ 3.7.0 (-std=c++11, -O3 and -pthread)
  • BM_Base* benchmark tries to measure the time populating std::vector<>. That time should be subtracted from the sorting results for better comparison.

First benchmark sorts std::vector<int> with 512k size.

[ g++ ]# benchmark_sorts --benchmark_repetitions=10 Run on (1 X 2500 MHz CPU ) 2016-01-08 01:37:43 Benchmark                         Time(ns)    CPU(ns) Iterations ---------------------------------------------------------------- ... BM_BaseInt/512k_mean              24730499   24726189         28 BM_BaseInt/512k_stddev              293107     310668          0 ... BM_SortInt/512k_mean              70967679   70799990         10 BM_SortInt/512k_stddev             1300811    1301295          0 ... BM_StableSortInt/512k_mean        73487904   73481467          9 BM_StableSortInt/512k_stddev        979966     925172          0 
[ clang++ ]# benchmark_sorts --benchmark_repetitions=10 Run on (1 X 2500 MHz CPU ) 2016-01-08 01:39:07 Benchmark                         Time(ns)    CPU(ns) Iterations ---------------------------------------------------------------- ... BM_BaseInt/512k_mean              26198558   26197526         27 BM_BaseInt/512k_stddev              320971     348314          0 ... BM_SortInt/512k_mean              70648019   70666660         10 BM_SortInt/512k_stddev             2030727    2033062          0 ... BM_StableSortInt/512k_mean        82004375   81999989          9 BM_StableSortInt/512k_stddev        197309     181453          0 

Second benchmark sorts std::vector<S> with 512k size (sizeof(Struct S) = 20).

[ g++ ]# benchmark_sorts --benchmark_repetitions=10 Run on (1 X 2500 MHz CPU ) 2016-01-08 01:49:32 Benchmark                         Time(ns)    CPU(ns) Iterations ---------------------------------------------------------------- ... BM_BaseStruct/512k_mean           26485063   26410254         26 BM_BaseStruct/512k_stddev           270355     128200          0 ... BM_SortStruct/512k_mean           81844178   81833325          8 BM_SortStruct/512k_stddev           240868     204088          0 ... BM_StableSortStruct/512k_mean    106945879  106857114          7 BM_StableSortStruct/512k_stddev   10446119   10341548          0 
[ clang++ ]# benchmark_sorts --benchmark_repetitions=10 Run on (1 X 2500 MHz CPU ) 2016-01-08 01:53:01 Benchmark                         Time(ns)    CPU(ns) Iterations ---------------------------------------------------------------- ... BM_BaseStruct/512k_mean           27327329   27280000         25 BM_BaseStruct/512k_stddev           488318     333059          0  ... BM_SortStruct/512k_mean           78611207   78407400          9 BM_SortStruct/512k_stddev           690207     372230          0  ... BM_StableSortStruct/512k_mean    109477231  109333325          8 BM_StableSortStruct/512k_stddev   11697084   11506626          0 

Anyone who likes to run the benchmark, here is the code,

#include <vector> #include <random> #include <algorithm>  #include "benchmark/benchmark_api.h"  #define SIZE 1024 << 9  static void BM_BaseInt(benchmark::State &state) {   std::random_device rd;   std::mt19937 mt(rd());   std::uniform_int_distribution<int> dist;    while (state.KeepRunning()) {     std::vector<int> v;     v.reserve(state.range_x());     for (int i = 0; i < state.range_x(); i++) {       v.push_back(dist(mt));     }   } } BENCHMARK(BM_BaseInt)->Arg(SIZE);  static void BM_SortInt(benchmark::State &state) {   std::random_device rd;   std::mt19937 mt(rd());   std::uniform_int_distribution<int> dist;    while (state.KeepRunning()) {     std::vector<int> v;     v.reserve(state.range_x());     for (int i = 0; i < state.range_x(); i++) {       v.push_back(dist(mt));     }      std::sort(v.begin(), v.end());   } } BENCHMARK(BM_SortInt)->Arg(SIZE);  static void BM_StableSortInt(benchmark::State &state) {   std::random_device rd;   std::mt19937 mt(rd());   std::uniform_int_distribution<int> dist;    while (state.KeepRunning()) {     std::vector<int> v;     v.reserve(state.range_x());     for (int i = 0; i < state.range_x(); i++) {       v.push_back(dist(mt));     }      std::stable_sort(v.begin(), v.end());   } } BENCHMARK(BM_StableSortInt)->Arg(SIZE);   struct S {   int key;   int arr[4]; };  static void BM_BaseStruct(benchmark::State &state) {   std::random_device rd;   std::mt19937 mt(rd());   std::uniform_int_distribution<int> dist;    while (state.KeepRunning()) {     std::vector<S> v;     v.reserve(state.range_x());     for (int i = 0; i < state.range_x(); i++) {       v.push_back({dist(mt)});     }   } } BENCHMARK(BM_BaseStruct)->Arg(SIZE);  static void BM_SortStruct(benchmark::State &state) {   std::random_device rd;   std::mt19937 mt(rd());   std::uniform_int_distribution<int> dist;    while (state.KeepRunning()) {     std::vector<S> v;     v.reserve(state.range_x());     for (int i = 0; i < state.range_x(); i++) {       v.push_back({dist(mt)});     }      std::sort(v.begin(), v.end(),               [](const S &a, const S &b) { return a.key < b.key; });   } } BENCHMARK(BM_SortStruct)->Arg(SIZE);  static void BM_StableSortStruct(benchmark::State &state) {   std::random_device rd;   std::mt19937 mt(rd());   std::uniform_int_distribution<int> dist;    while (state.KeepRunning()) {     std::vector<S> v;     v.reserve(state.range_x());     for (int i = 0; i < state.range_x(); i++) {       v.push_back({dist(mt)});     }      std::stable_sort(v.begin(), v.end(),                      [](const S &a, const S &b) { return a.key < b.key; });   } } BENCHMARK(BM_StableSortStruct)->Arg(SIZE);   BENCHMARK_MAIN(); 
like image 161
Alper Avatar answered Oct 18 '22 00:10

Alper


std::stable_sort performs NlogN comparisons when sufficient memory is available. When insufficient memory is available, it degrades to N((logN)^2) comparisons. Therefore it is roughly of the same efficiency as std::sort (which performs O(NlogN) comparisons in both average and worst case) when memory is available.

For those interested, sort() uses an introsort (quicksort which switches to heapsort when the recursion reaches a certain depth) and stable_sort() uses a merge sort.

like image 45
moinudin Avatar answered Oct 18 '22 01:10

moinudin