I am comparing the performance between Julia and C++. Then I found quick sort is much faster in Julia (and even faster than C++), especially when the size of array is very big.
Can anyone explain the reasons?
quickSort.jl
include("../dimension.jl")
function execute()
n = getDimension()
print(stderr, "Julia,quickSort_optim,$n,"); # use default delimiter
arr = zeros(Int32, n)
for i = 1:n
arr[i] = (777*(i-1)) % 10000
end
if n > 0
sort!(arr; alg=QuickSort)
end
end
# executing ...
execute()
quickSort_boost.cpp
#include "dimension.h"
#include <boost/lambda/lambda.hpp>
#include <boost/sort/pdqsort/pdqsort.hpp>
#include <iostream>
#include <iterator>
#include <algorithm>
using namespace std;
using namespace boost::sort;
int main()
{
int n = getDimension();
cerr << "C++,quickSort_boost," << n << ",";
vector<int> arr(n);
unsigned long long w;
for(int i = 0; i < n; ++i){ // array for sorting
w = (777*i) % 10000; // Array with values between 0 and 10000
arr[i] = w;
}
if (n > 0){
pdqsort_branchless(arr.begin(), arr.end(), [](const int &a, const int &b){return ( a < b );});
}
return 0;
}
Comparison
Note
The function getDimension() is used to get the array size.
The execution time is measured with the shell command: /usr/bin/time under Ubuntu. Compiler: clang version 6.0.0-1ubuntu2. Optimization level: -02. CPU: Intel i7-3820QM
The reason that I compared the whole execution time rather than just the algorithm itself is because I want to compare the performance between these two languages, which simulates a real application scenario.
In Julia official document, it writes: QuickSort: good performance for large collections. Is this because Julia uses a special implementation inside the algorithm.
More samples
I run the test with more samples. It seems that the distribution of the data is the issue.
function execute() # julia code segment for specifying data
for i = 1:n
arr[i] = i
end
for(int i = 0; i < n; ++i){ // c++ boost code segment for specifying data
arr[i] = i + 1;
}
function execute() # julia code segment for specifying data
for i = 1:n
arr[i] = n - i + 1
end
for(int i = 0; i < n; ++i){ // c++ boost code segment for specifying data
arr[i] = n - i;
}
function execute() # julia code segment for specifying data
for i = 1:n
arr[i] = i % 10
end
for(int i = 0; i < n; ++i){ // c++ boost code segment for specifying data
arr[i] = (i + 1) % 10;
}
My source code
Not sure what's up with the timings – you don't include enough code to test or reproduce. The Julia code for QuickSort is pretty simple, you can see the source for it here:
https://github.com/JuliaLang/julia/blob/77487611fd9751c0f31ac3853072e6ca11efaf05/base/sort.jl#L518-L579
I'll include the code for easy reading here as well:
@inline function selectpivot!(v::AbstractVector, lo::Integer, hi::Integer, o::Ordering)
@inbounds begin
mi = midpoint(lo, hi)
# sort v[mi] <= v[lo] <= v[hi] such that the pivot is immediately in place
if lt(o, v[lo], v[mi])
v[mi], v[lo] = v[lo], v[mi]
end
if lt(o, v[hi], v[lo])
if lt(o, v[hi], v[mi])
v[hi], v[lo], v[mi] = v[lo], v[mi], v[hi]
else
v[hi], v[lo] = v[lo], v[hi]
end
end
# return the pivot
return v[lo]
end
end
function partition!(v::AbstractVector, lo::Integer, hi::Integer, o::Ordering)
pivot = selectpivot!(v, lo, hi, o)
# pivot == v[lo], v[hi] > pivot
i, j = lo, hi
@inbounds while true
i += 1; j -= 1
while lt(o, v[i], pivot); i += 1; end;
while lt(o, pivot, v[j]); j -= 1; end;
i >= j && break
v[i], v[j] = v[j], v[i]
end
v[j], v[lo] = pivot, v[j]
# v[j] == pivot
# v[k] >= pivot for k > j
# v[i] <= pivot for i < j
return j
end
function sort!(v::AbstractVector, lo::Integer, hi::Integer, a::QuickSortAlg, o::Ordering)
@inbounds while lo < hi
hi-lo <= SMALL_THRESHOLD && return sort!(v, lo, hi, SMALL_ALGORITHM, o)
j = partition!(v, lo, hi, o)
if j-lo < hi-j
# recurse on the smaller chunk
# this is necessary to preserve O(log(n))
# stack space in the worst case (rather than O(n))
lo < (j-1) && sort!(v, lo, j-1, a, o)
lo = j+1
else
j+1 < hi && sort!(v, j+1, hi, a, o)
hi = j-1
end
end
return v
end
Nothing special going on there, just a basic well-optimized quicksort. So the real question is why the C++/Boost version is slow, which I'm not in a good position to answer. Perhaps one of the many C++ experts hanging around here can address that.
It seems like it's possible that C++ does not choose pivots as well as Julia does. Pivot selection is a bit of a balancing act: on the one hand, if you pick bad pivots the asymptotic behavior can become really bad; on the other hand, if you spend too much time and effort picking pivots, it can slow the whole sort down considerably.
Another possibility is that Boost's quicksort implementation doesn't use a different algorithm as a base case once the array size gets smaller. That is pretty key for good performance since quicksort doesn't perform well on small arrays. The Julia version switches to insertion sort for arrays of less than 20 elements.
TL;DR Julia and C++'s std::sort
are extremely close in performance. Boost's pdqsort_branchless
is faster.
Methodology:
-march=native -mtune=native
ref
Code:
pdqsort_branchless
is faster. The STL's std::sort
and Julia's sort are reasonably close together in runtime performance.qsort
which is slower. That's why I don't show it any further.Plot explanation:
std::sort
and Julia.On C++, use the benchmark library.
For your convenience, you can access it online at Quick-Bench. Here is this code: https://quick-bench.com/q/CVu8y1fjwh19AHJH1iUCogxG6Pk
This lets you specify which part of your code you want to measure. It will also do the measuring and the statistics correctly.
Here's an example:
static void Example(benchmark::State &state) {
std::vector<int> data(1024);
std::iota(data.begin(), data.end(), 0);
std::mt19937 mersenne_engine{1234};
for (auto _ : state) {
state.PauseTiming();
std::shuffle(data.begin(), data.end(), mersenne_engine);
state.ResumeTiming();
std::sort(data.begin(), data.end());
}
}
BENCHMARK(Example);
Google Benchmark only measures code inside the for (auto _ : state) {...}
block (the timed block). So, the code above avoids the pitfall of measuring setup time.
Before each run, the data is shuffle
d.
Use the BenchmarkTools
library, where the benchmark is done like this:
Random.seed!(1234)
function make_vec(n)
return Random.randperm(Int32(n));
end
@benchmark sort!(arr; alg=QuickSort) setup=(arr=make_vec(1024));
We need to clarify the way you generate data.
As you discover, the algorithms work fastest on already sorted data and worse on reverse-sorted data.
In the end, the domain knowledge of where the benchmarked algorithm is run needs to be known. What is the shape of the data distribution?
First, let's just have uniformly distributed data in [0, 10 000]
. Much better than testing on already sorted data.
I am using a fixed seed, so the data will be the same every time (for a given N). If you instantiate mersenne_engine
with rnd_device()
, you get a different seed every time, and thus differing data. Personally, I favor going with random but consistent data, to make the result less variable. Just keep it in mind for your later analysis.
Note that generating data between 0 and 10k will lead to many duplicates on inputs order of magnitudes greater than 10k.
If you want don't want to have duplicates, then generate a vector with elements [0, 1, 2, ..., N]
and then shuffle the data.
Here's how I'd do it:
// Create vector filled with 0..n
std::vector<int> arr(n);
std::iota(arr.begin(), arr.end(), 0);
std::shuffle(std::begin(arr), std::end(arr), mersenne_engine);
}
Julia:
function make_vec(n)
return Random.randperm(n);
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With