Here is a comparison I made. np.argsort
was timed on a float32 ndarray consists of 1,000,000 elements.
In [1]: import numpy as np
In [2]: a = np.random.randn(1000000)
In [3]: a = a.astype(np.float32)
In [4]: %timeit np.argsort(a)
86.1 ms ± 1.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
And here is a C++ program do the same procedure but on vectors referring to this answer.
#include <iostream>
#include <vector>
#include <cstddef>
#include <algorithm>
#include <opencv2/opencv.hpp>
#include <numeric>
#include <utility>
int main()
{
std::vector<float> numbers;
for (int i = 0; i != 1000000; ++i) {
numbers.push_back((float)rand() / (RAND_MAX));
}
double e1 = (double)cv::getTickCount();
std::vector<size_t> idx(numbers.size());
std::iota(idx.begin(), idx.end(), 0);
std::sort(idx.begin(), idx.end(), [&numbers](const size_t &a, const size_t &b)
{ return numbers[a] < numbers[b];});
double e2 = (double)cv::getTickCount();
std::cout << "Finished in " << 1000 * (e2 - e1) / cv::getTickFrequency() << " milliseconds." << std::endl;
return 0;
}
It prints Finished in 525.908 milliseconds.
and it is far slower than the numpy version. So could anyone explain what makes np.argsort
so fast? Thanks.
Edit1: np.__version__
returns 1.15.0
which runs on Python 3.6.6 |Anaconda custom (64-bit)
and g++ --version
prints 8.2.0. Operating system is Manjaro Linux.
Edit2: I treid to compile with This was deleted because I mistakenly run the test with my laptop's DC adapter unplugged, which would cause it slow down. In a fair competition, C-array and vector version perform equally (take about 100ms).-O2
and -O3
flags in g++
and I got result within 216.515 miliseconds and 205.017 miliseconds. That is an improve but still slower than numpy version. (Referring to this question)
Edit3: Another approach would be to replace vector with C like array: float numbers[1000000];
. After that the running time is about 100ms(+/-5ms). Full code here:
#include <iostream>
#include <vector>
#include <cstddef>
#include <algorithm>
#include <opencv2/opencv.hpp>
#include <numeric>
#include <utility>
int main()
{
//std::vector<float> numbers;
float numbers[1000000];
for (int i = 0; i != 1000000; ++i) {
numbers[i] = ((float)rand() / (RAND_MAX));
}
double e1 = (double)cv::getTickCount();
std::vector<size_t> idx(1000000);
std::iota(idx.begin(), idx.end(), 0);
std::sort(idx.begin(), idx.end(), [&numbers](const size_t &a, const size_t &b)
{ return numbers[a] < numbers[b];});
double e2 = (double)cv::getTickCount();
std::cout << "Finished in " << 1000 * (e2 - e1) / cv::getTickFrequency() << " milliseconds." << std::endl;
return 0;
}
NumPy's np. argsort is able to do stable sorting through passing kind = 'stable' argument. Also np. argsort doesn't support reverse (descending) order.
Returns the indices that would sort an array. Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as a that index data along the given axis in sorted order.
sort() returns the sorted array whereas np. argsort() returns an array of the corresponding indices. The figure shows how the algorithm transforms an unsorted array [10, 6, 8, 2, 5, 4, 9, 1] into a sorted array [1, 2, 4, 5, 6, 8, 9, 10] .
I took your implementation and measured it with 10000000
items. It took approximated 1.7 seconds.
Now I introduced a class
class valuePair {
public:
valuePair(int idx, float value) : idx(idx), value(value){};
int idx;
float value;
};
with is initialized as
std::vector<valuePair> pairs;
for (int i = 0; i != 10000000; ++i) {
pairs.push_back(valuePair(i, (double)rand() / (RAND_MAX)));
}
and the sorting than is done with
std::sort(pairs.begin(), pairs.end(), [&](const valuePair &a, const valuePair &b) { return a.value < b.value; });
This code reduces the runtime down to 1.1 seconds. This is I think due to a better cache coherency, but still quite far away from the python results.
Ideas:
different underlying algorithm:. np.argsort
uses quicksort as default, the implementation in C++
might depend on your compiler.
function call overhead: I'm not sure if not the case according to this postC++
compilers inline your comparison function. If not, calling this function also might introduce some overhead.
compiler flags ?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With