Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to sort a list of number and their index

I have a question that could seem very basic, but it is in a context where "every CPU tick counts" (this is a part of a larger algorithm that will be used on supercomputers).

The problem is quite simple : what is the fastest way to sort a list of unsigned long long int numbers and their original indexes ? (At the beginning, the unsigned long long int numbers are in a completely random order.)

Example :
Before
Numbers: 32 91 11 72
Indexes: 0 1 2 3
After
Numbers: 11 32 72 91
Indexes: 2 0 3 1 

By "fastest way", I mean : what algorithm to use : std::sort, C qsort, or another sorting algorithm available on the web ? What container to use (C array, std::vector, std::map...) ? How to sort the indexes at the same time (use structures, std::pair, std::map...) ?

How many element to sort ? -> typically 4Go of numbers

like image 776
Vincent Avatar asked Apr 23 '12 20:04

Vincent


3 Answers

The obvious starting point would be a structure with operator< defined for it:

struct data { 
    unsigned long long int number;
    size_t index;
};

struct by_number { 
    bool operator()(data const &left, data const &right) { 
        return left.number < right.number;
    }
};

...and an std::vector to hold the data:

 std::vector<data> items;

and std::sort to do the sorting:

 std::sort(items.begin(), items.end(), by_number());

The simple fact is, that the normal containers (and such) are sufficiently efficient that using them doesn't make your code substantially less efficient. You might be able to do better by writing some part in a different way, but you might about as easily do worse. Start from solid and readable, and test -- don't (attempt to) optimize prematurely.

Edit: of course in C++11, you can use a lambda expression instead:

std::sort(items.begin(), items.end(), 
          [](data const &a, data const &b) { return a.number < b.number; });

This is generally a little more convenient to write. Readability depends--for something simple like this, I'd say sort ... by_number is pretty readable, but that depends (heavily) on the name you give to the comparison operator. The lambda makes the actual sorting criteria easier to find, so you don't need to choose a name carefully for the code to be readable.

like image 65
Jerry Coffin Avatar answered Nov 07 '22 01:11

Jerry Coffin


std::pair and std::sort fit your requirements ideally: if you put the value into the pair.first and the index in pair.second, you can simply call a sort on a vector of pairs, like this:

// This is your original data. It does not need to be in a vector
vector<long> orig;
orig.push_back(10);
orig.push_back(3);
orig.push_back(6);
orig.push_back(11);
orig.push_back(2);
orig.push_back(19);
orig.push_back(7);
// This is a vector of {value,index} pairs
vector<pair<long,size_t> > vp;
vp.reserve(orig.size());
for (size_t i = 0 ; i != orig.size() ; i++) {
    vp.push_back(make_pair(orig[i], i));
}
// Sorting will put lower values ahead of larger ones,
// resolving ties using the original index
sort(vp.begin(), vp.end());
for (size_t i = 0 ; i != vp.size() ; i++) {
    cout << vp[i].first << " " << vp[i].second << endl;
}
like image 20
Sergey Kalinichenko Avatar answered Nov 07 '22 01:11

Sergey Kalinichenko


std::sort has proven to be faster than the old qsort because of the lack of indirection and the possibility of inlining critical operations.

The implementations of std::sort are likely to be highly optimized and hard to beat, but not impossible. If your data is fixed length and short you might find Radix sort to be faster. Timsort is relatively new and has delivered good results for Python.

You might keep the index array separate from the value array, but I think the extra level of indirection will prove to be a speed killer. Better to keep them together in a struct or std::pair.

As always with any speed critical application, you must try some actual implementations and compare them to know for sure which is fastest.

like image 3
Mark Ransom Avatar answered Nov 07 '22 01:11

Mark Ransom