Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ - vector version implement of argsort low effiency compared to the one in numpy

Here is a comparison I made. np.argsort was timed on a float32 ndarray consists of 1,000,000 elements.

In [1]: import numpy as np

In [2]: a = np.random.randn(1000000)

In [3]: a = a.astype(np.float32)

In [4]: %timeit np.argsort(a)
86.1 ms ± 1.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

And here is a C++ program do the same procedure but on vectors referring to this answer.

#include <iostream>
#include <vector>
#include <cstddef>
#include <algorithm>
#include <opencv2/opencv.hpp>
#include <numeric>
#include <utility>
int main()
{
  std::vector<float> numbers;
  for (int i = 0; i != 1000000; ++i) {
    numbers.push_back((float)rand() / (RAND_MAX));
  }

  double e1 = (double)cv::getTickCount();

  std::vector<size_t> idx(numbers.size());
  std::iota(idx.begin(), idx.end(), 0);

  std::sort(idx.begin(), idx.end(), [&numbers](const size_t &a, const size_t &b)
                                               { return numbers[a] < numbers[b];});

  double e2 = (double)cv::getTickCount();
  std::cout << "Finished in " << 1000 * (e2 - e1) / cv::getTickFrequency() << " milliseconds." << std::endl;
  return 0;
}

It prints Finished in 525.908 milliseconds. and it is far slower than the numpy version. So could anyone explain what makes np.argsort so fast? Thanks.


Edit1: np.__version__ returns 1.15.0 which runs on Python 3.6.6 |Anaconda custom (64-bit) and g++ --version prints 8.2.0. Operating system is Manjaro Linux.


Edit2: I treid to compile with -O2 and -O3 flags in g++ and I got result within 216.515 miliseconds and 205.017 miliseconds. That is an improve but still slower than numpy version. (Referring to this question) This was deleted because I mistakenly run the test with my laptop's DC adapter unplugged, which would cause it slow down. In a fair competition, C-array and vector version perform equally (take about 100ms).


Edit3: Another approach would be to replace vector with C like array: float numbers[1000000];. After that the running time is about 100ms(+/-5ms). Full code here:

#include <iostream>
#include <vector>
#include <cstddef>
#include <algorithm>
#include <opencv2/opencv.hpp>
#include <numeric>
#include <utility>
int main()
{
  //std::vector<float> numbers;
  float numbers[1000000];
  for (int i = 0; i != 1000000; ++i) {
    numbers[i] = ((float)rand() / (RAND_MAX));
  }

  double e1 = (double)cv::getTickCount();

  std::vector<size_t> idx(1000000);
  std::iota(idx.begin(), idx.end(), 0);

  std::sort(idx.begin(), idx.end(), [&numbers](const size_t &a, const size_t &b)
                                               { return numbers[a] < numbers[b];});

  double e2 = (double)cv::getTickCount();
  std::cout << "Finished in " << 1000 * (e2 - e1) / cv::getTickFrequency() << " milliseconds." << std::endl;
  return 0;
}
like image 419
Page David Avatar asked Aug 23 '18 09:08

Page David


People also ask

Is NumPy Argsort stable?

NumPy's np. argsort is able to do stable sorting through passing kind = 'stable' argument. Also np. argsort doesn't support reverse (descending) order.

What does Argsort () do in Python?

Returns the indices that would sort an array. Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as a that index data along the given axis in sorted order.

What is the difference between Argsort and sort?

sort() returns the sorted array whereas np. argsort() returns an array of the corresponding indices. The figure shows how the algorithm transforms an unsorted array [10, 6, 8, 2, 5, 4, 9, 1] into a sorted array [1, 2, 4, 5, 6, 8, 9, 10] .


2 Answers

I took your implementation and measured it with 10000000 items. It took approximated 1.7 seconds.

Now I introduced a class

class valuePair {
  public:
    valuePair(int idx, float value) : idx(idx), value(value){};
    int idx;
    float value;
};

with is initialized as

std::vector<valuePair> pairs;
for (int i = 0; i != 10000000; ++i) {
    pairs.push_back(valuePair(i, (double)rand() / (RAND_MAX)));
}

and the sorting than is done with

std::sort(pairs.begin(), pairs.end(), [&](const valuePair &a, const valuePair &b) { return a.value < b.value; });

This code reduces the runtime down to 1.1 seconds. This is I think due to a better cache coherency, but still quite far away from the python results.

like image 173
schorsch312 Avatar answered Sep 24 '22 02:09

schorsch312


Ideas:

  • different underlying algorithm:. np.argsort uses quicksort as default, the implementation in C++ might depend on your compiler.

  • function call overhead: I'm not sure if C++ compilers inline your comparison function. If not, calling this function also might introduce some overhead. not the case according to this post

  • compiler flags ?

like image 21
rocksportrocker Avatar answered Sep 23 '22 02:09

rocksportrocker