Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently sort an array and get the permutation index?

I have two random access iterables: one containing the values and the other containing a permutation index. I want to sort the values and track the changes in the permutation index.

Sample input:

std::array values{7, 45, 18, 33, 77, 96, 83, 80, 4, 51};
std::array<int, values.size()> index;

Output:

values == {4, 7, 18, 33, 45, 51, 77, 80, 83, 96};
index == {8, 0, 2, 3, 1, 9, 4, 7, 6, 5};

The simple solution

One solution is to construct a vector of pairs containing the value and its index, sort this vector by the values and then extract the solution from this. But the original vector may be large so I would like to avoid converting it twice. I will nickname this solution vector pair sort.

Another solution

I can just std::sort the permutation index (which would contain 0, 1, 2... in the beginning) using values in the values iterable. This is described at How to obtain the index permutation after the sorting This would give me the permutation index but it won't sort the value iterable. I could do two things from here:

  1. std::sort() the values.

    I would have to sort the data twice. I would like to avoid that. The comparison operator might not be cheap. This solution will be double sort.

  2. Sort the values using the permutation index.

    I am not exactly sure how to do that. One solution is described here. This solution is boost index apply sort (because it is implemented by boost::algorithm::apply_permutation()). This modifies the permutation index so I have to make a copy before passing it to boost::algorithm::apply_permutation().

    A simple solution that can do it in place is described here. This is permutate in place sort.

    Another solution would be to convert the index into cycles and then apply it. I haven't implemented this.

The "best" solution

I think that the most logical approach is to sort the values normally, but when the sorting algorithm swaps two values, it should also swap corresponding indexed in the permutation index.

This would sort both iterables in place at once without any added cost.

But I have no idea how to implement this. If I would be implementing the sorting algorithm, I could just add a extra swap() and the issue would be solved. But I'd like to reuse std::sort(). It gives me a reasonably fast sorting algorithm without the need of writing it. I could also replace it with a different algorithm in the future by just replacing std::sort() with custom_sort() (assuming that custom_sort() has same signature which is a reasonable assumption).

I am interested in swap(). std::sort() is using this function to swap elements. My goal is to override this function and to make it swap the actual values and the indexes.

One way of doing that is creating a custom iterator that would combine iterators of values and indexes. This iterator would return a reference to a wrapper value upon calling operator* or operator[]. This wrapper value would have a custom swap().

Benchmarks

By the way I have done some benchmarks. You can find them and the implementations at https://github.com/meator/indexsort_impl I tried to sort a vector containing 1 000 000 random ints and doubles. The ints are ranging from INT_MIN to INT_MAX and the doubles are ranging from 0 to 1. I ran 200 samples. Here are the results:

function int benchmark double benchmark
value sort* 114.153 ms 127.824 ms
index sort using values* 177.985 ms 206.153 ms
vector pair sort 122.795 ms 139.467 ms
double sort 291.954 ms 339.121 ms
boost index apply sort 263.827 ms 298.844 ms
permutate in place sort 679.917 ms 708.676 ms

The "simple" vector pair sort is surprisingly good. Maybe the compiler is smart.

Question

Is my "best" solution with iterators viable? How could I implement that? Are there some more solutions that I have dismissed? Which is the best?


EDIT: My benchmarks contained a massive flaw which invalidated all results. I have fixed that. I also added raw sort benchmarks as per Paul Sanders'es suggestion. These of course aren't indexsort.

* As mentioned above, these aren't actual indexsort.

like image 767
user13840624 Avatar asked Sep 17 '25 05:09

user13840624


2 Answers

Your best solution should be doable. According to std::sort one signature until C++20 is

template< class RandomIt >
void sort( RandomIt first, RandomIt last );

and after it changes to

template< class RandomIt >
constexpr void sort( RandomIt first, RandomIt last );

where class RandomIt meets the requirement of both LegacyRandomAccessIterator and ValueSwappable. Which looks like the iterator can contain an index and pointers to both arrays, implement the right methods, and you have a swap function that takes 2 iterators, and swaps the contents of both functions.

And then you can use std::sort with no new large data structures.

like image 143
btilly Avatar answered Sep 18 '25 17:09

btilly


Your "best" solution is not going to be best, because it does O(N log N) value swaps when only O(N) are required.

I think the actually best option is to sort the indexes using the values, and then reorder the values in O(N) time using the permutation index.

Here's a simple implementation that temporarily inverts entries in the permutation index to mark which value positions we've sorted, and then fixes them all. When it's finished, both data and index are correct.

I stole the sort part from @SergeyKalinichenko's answer that you linked:

vector<int> index(data.size(), 0);
for (int i = 0 ; i != index.size() ; i++) {
    index[i] = i;
}
sort(index.begin(), index.end(),
    [&](const int& a, const int& b) {
        return (data[a] < data[b]);
    }
);
for (int target = 0; target < index.size(); ++target) {
    int src = index[target];
    if (src < 0) {
        // did this one already
        index[target] = ~src;
        continue;
    }
    if (src == target) {
        //already in the right place
        continue;
    }
    auto save = data[target];
    int t2 = target;
    do {
        data[t2] = data[src];
        t2 = src;
        src = index[t2];
        index[t2] = ~src; // mark t2 position as done. invar: t2 > target
    } while(src != target);
    data[t2] = save;
}
like image 28
Matt Timmermans Avatar answered Sep 18 '25 18:09

Matt Timmermans