Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ Parallel std::vector Sorting With Expensive Copying

Assume I have a vector<int> intVec and a vector<vector<double> > matrix. I want to sort intVec and reorder the first dimension of matrix correspondingly in C++. I realize this question has been asked several times before, but this case has a twist. A vector<double> is expensive to copy, so e.g. copying both intVec and matrix to a vector<pair<int, vector<double> >, sorting that and copying them back is even more inefficient than usual.

Short of rolling my own custom sorting algorithm, how can I sort intVec and reorder the first dimension of matrix in lockstep without copying any element of matrix and invoking vector's copy constructor?

like image 689
dsimcha Avatar asked Dec 27 '12 18:12

dsimcha


3 Answers

A vector<double> is expensive to copy, so e.g. copying both intVec and matrix to a vector<pair<int, vector<double> >, sorting that and copying them back is even more inefficient than usual.

The simplest way to get the optimization you want then is to swap the elements of the source vector<vector<double>> into the temporary vector<pair<int, vector<double>>, sort it, and then swap them back to their new locations in the original vector.

There will still be more overhead than is strictly necessary (for example to construct and destroy empty vectors). However, no vector is ever copied and the code remains very similar to what you already have. So if you are correct that the problem is the cost of copying, problem solved.

In C++11 you could move in both directions rather than swapping. I doubt there's much performance difference between moving and swapping with an empty vector, but I'm not sure there isn't.

like image 78
Steve Jessop Avatar answered Nov 18 '22 07:11

Steve Jessop


One option: create a std::vector<std::pair<int,size_t>> where the first element is the int from intVec and the second element is the original index of that element. Then sort that new vector. Then shuffle your matrix and intVec to the order indicated by the second elements of the pairs (e.g., with a single pass, doing swaps).

like image 1
Edward Loper Avatar answered Nov 18 '22 08:11

Edward Loper


Based on the space between your two >'s, I'm guessing you're using pre-C++11 C++. In C++11, std::sort seems to move elements whenever possible instead of copying.

You can pass a custom comparator to std::sort. However, even if you do this, you're doing Theta(n log n) copies of pair<int, vector<double> >'s.

I'd guess, based on not actually trying it, that you should sort a pair<int, vector<double> *> (or pair<int, int> if int is big enough), using the second int as an index) instead to get the appropriate permutation and then apply the permutation using vector's swap member function to avoid copying the vector contents.

like image 1
tmyklebu Avatar answered Nov 18 '22 06:11

tmyklebu