I have a HUGE table (about 50Gb) in (i,j,k) format (from a sparse matrix) stored as
uint32_t * idx1, * idx2;
float * vals;
uint32_t tablesize;
and I'd like to sort it in place with a given comparison function that's a function of idx1 and idx2. Can this be done using std::sort?
Specifically, each nonzero entry (i,j) with value v in the sparse matrix is stored by placing i in idx1, j in idx2, and v in the corresponding entry in vals. I'd like to then sort these entries according to (i1, j1, v1) <= (i2, j2, v2) if
(i1 < i2) || (i1==i2 && j1 <= j2)
The examples I've been able to scrounge up of using std::sort on nonstandard datatypes assume that each item being compared is a single instance of a class; here each item is represented by three values in different arrays.
Sort is an in-built function in a C++ STL ( Standard Template Library). This function is used to sort the elements in the range in ascending or descending order.
In more details it is implemented using hybrid of QuickSort, HeapSort and InsertionSort.By default, it uses QuickSort but if QuickSort is doing unfair partitioning and taking more than N*logN time, it switches to HeapSort and when the array size becomes really small, it switches to InsertionSort.
Sorting is one of the most basic functions applied to data. It means arranging the data in a particular fashion, which can be increasing or decreasing. There is a built-in function in C++ STL by the name of sort(). std::sort() is a generic function in C++ Standard Library, for doing comparison sorting.
As of September 2020, it appears that libc++ std::sort happens to be stable for all ranges of size less than 31, and libstdc++ std::sort happens to be stable for all ranges of size less than 17. (Do not rely on this little factoid in production!)
// sort () in STL. How to sort in descending order? sort () takes a third parameter that is used to specify the order in which elements are to be sorted. We can pass the “greater ()” function to sort in descending order.
To sort array or containers present in C++ STL in decreasing order, we have to pass a third parameter called greater<type> () in the sort () function. This function performs comparison so that elements at the end are sorted in descending order.
Sort Table by Clicking the Headers. Click the headers to sort the table. Click "Name" to sort by names, and "Country" to sort by country. The first time you click, the sorting direction is ascending (A to Z).
Click the headers to sort the table. Click "Name" to sort by names, and "Country" to sort by country. The first time you click, the sorting direction is ascending (A to Z).
It is unfortunately quite difficult to convince std::sort
, or any of the standard library, to work with striped data. It is designed to assume that data can be copied via a single =
, moved via one move
or swapped via one swap
.
Your best bet is to use boost::iterator_facade
to write a custom iterator class which wraps the data, and hides the striped data format from std::sort
. I've wanted to do something similar in the past but my workspace does not allow us to use boost
. EDIT: when your facade is dereferenced, it will probably need to create some sort of proxy object that can be assigned/moved/swapped and will do the right thing to each of the stripe arrays. It's not trivial.
The next best bet is to make an array of int
s from zero to N, each one representing an index into your striped data array. Write a custom functor to std::sort
which sorts this array to match your criteria. It's obviously far from ideal when you have such a large data set.
If you have to keep using your existing data structure, which is essentially a std::tuple
of three std::vector
s, using boost::zip_iterator
would seem to be the way to go. A zip_iterator
treats three iterators (two to indices and one to a value) as a single tuple, and you can use a custom comparison function object to sort your data in-place. Alas, boost::zip_iterator
can't be used with std::sort
, as explained in this Q&A, because it can't be written into.
This means that you would have to write your own zip_iterator class that can be used with std::sort
. Note that it is not a trivial exercise, see this Q&A and/or this paper.
It is a lot easier to sort a std::vector
of a std::tuple
. My attempt below uses a std::tuple
of two indices and a value, and stores those entries into a std::vector
. For the sorting, I use a C++14 generic lambda that forwards the two indices into a smaller tuple and compares those lexicographically (i.e. first on row-index, then on column-index) using the library operator<
of std::tuple
.
#include <algorithm>
#include <iostream>
#include <tuple>
#include <vector>
using index = uint32_t;
using value = float;
using sparse_entry = std::tuple<index, index, value>;
using sparse_matrix = std::vector<sparse_entry>;
int main()
{
// sparse 3x3 matrix
auto m = sparse_matrix {
std::make_tuple( 1, 1, -2.2),
std::make_tuple( 1, 0, 42 ),
std::make_tuple( 0, 2, 3.4),
std::make_tuple( 0, 1, 1.7)
};
// sort by row-index, then column-index
std::sort(begin(m), end(m), [](auto const& L, auto const& R) {
return
std::forward_as_tuple(std::get<0>(L), std::get<1>(L)) <
std::forward_as_tuple(std::get<0>(R), std::get<1>(R))
;
});
for (auto const& elem : m)
std::cout << "{ " << std::get<0>(elem) << ", " << std::get<1>(elem) << ", " << std::get<2>(elem) << "}, \n";
}
Live Example.
If your application can use this transformed data layout (and there maybe cache-performance reasons why it can't), then the above code will do the sorting as you want it.
NOTE: as @Casey mentions, you can also use std::tie
instead of std::forward_as_tuple
, but that can bite you when you change sparse_entry
into a full-fledged user-defined class with getters returning by-value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With