Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sorting table in place using stl sort

I have a HUGE table (about 50Gb) in (i,j,k) format (from a sparse matrix) stored as

uint32_t * idx1, * idx2;
float * vals;
uint32_t tablesize;

and I'd like to sort it in place with a given comparison function that's a function of idx1 and idx2. Can this be done using std::sort?

Specifically, each nonzero entry (i,j) with value v in the sparse matrix is stored by placing i in idx1, j in idx2, and v in the corresponding entry in vals. I'd like to then sort these entries according to (i1, j1, v1) <= (i2, j2, v2) if

(i1 < i2) || (i1==i2 && j1 <= j2)

The examples I've been able to scrounge up of using std::sort on nonstandard datatypes assume that each item being compared is a single instance of a class; here each item is represented by three values in different arrays.

like image 595
AatG Avatar asked Nov 11 '14 22:11

AatG


People also ask

Is STL sort in place?

Sort is an in-built function in a C++ STL ( Standard Template Library). This function is used to sort the elements in the range in ascending or descending order.

Which sorting algorithm is used in STL sort?

In more details it is implemented using hybrid of QuickSort, HeapSort and InsertionSort.By default, it uses QuickSort but if QuickSort is doing unfair partitioning and taking more than N*logN time, it switches to HeapSort and when the array size becomes really small, it switches to InsertionSort.

How does sort STL work in C++?

Sorting is one of the most basic functions applied to data. It means arranging the data in a particular fashion, which can be increasing or decreasing. There is a built-in function in C++ STL by the name of sort(). std::sort() is a generic function in C++ Standard Library, for doing comparison sorting.

Is sort STL stable?

As of September 2020, it appears that libc++ std::sort happens to be stable for all ranges of size less than 31, and libstdc++ std::sort happens to be stable for all ranges of size less than 17. (Do not rely on this little factoid in production!)

How to sort in STL with sort () function?

// sort () in STL. How to sort in descending order? sort () takes a third parameter that is used to specify the order in which elements are to be sorted. We can pass the “greater ()” function to sort in descending order.

How to sort array or containers in C++ STL in decreasing order?

To sort array or containers present in C++ STL in decreasing order, we have to pass a third parameter called greater<type> () in the sort () function. This function performs comparison so that elements at the end are sorted in descending order.

How do you sort a table in a table view?

Sort Table by Clicking the Headers. Click the headers to sort the table. Click "Name" to sort by names, and "Country" to sort by country. The first time you click, the sorting direction is ascending (A to Z).

How do I sort a table by name and country?

Click the headers to sort the table. Click "Name" to sort by names, and "Country" to sort by country. The first time you click, the sorting direction is ascending (A to Z).


2 Answers

It is unfortunately quite difficult to convince std::sort, or any of the standard library, to work with striped data. It is designed to assume that data can be copied via a single =, moved via one move or swapped via one swap.

Your best bet is to use boost::iterator_facade to write a custom iterator class which wraps the data, and hides the striped data format from std::sort. I've wanted to do something similar in the past but my workspace does not allow us to use boost. EDIT: when your facade is dereferenced, it will probably need to create some sort of proxy object that can be assigned/moved/swapped and will do the right thing to each of the stripe arrays. It's not trivial.

The next best bet is to make an array of ints from zero to N, each one representing an index into your striped data array. Write a custom functor to std::sort which sorts this array to match your criteria. It's obviously far from ideal when you have such a large data set.

like image 145
StilesCrisis Avatar answered Nov 03 '22 15:11

StilesCrisis


If you have to keep using your existing data structure, which is essentially a std::tuple of three std::vectors, using boost::zip_iterator would seem to be the way to go. A zip_iterator treats three iterators (two to indices and one to a value) as a single tuple, and you can use a custom comparison function object to sort your data in-place. Alas, boost::zip_iterator can't be used with std::sort, as explained in this Q&A, because it can't be written into.

This means that you would have to write your own zip_iterator class that can be used with std::sort. Note that it is not a trivial exercise, see this Q&A and/or this paper.

It is a lot easier to sort a std::vector of a std::tuple. My attempt below uses a std::tuple of two indices and a value, and stores those entries into a std::vector. For the sorting, I use a C++14 generic lambda that forwards the two indices into a smaller tuple and compares those lexicographically (i.e. first on row-index, then on column-index) using the library operator< of std::tuple.

#include <algorithm>
#include <iostream>
#include <tuple>
#include <vector>

using index = uint32_t;
using value = float;
using sparse_entry = std::tuple<index, index, value>;
using sparse_matrix = std::vector<sparse_entry>;

int main()
{
    // sparse 3x3 matrix
    auto m = sparse_matrix { 
        std::make_tuple( 1, 1, -2.2), 
        std::make_tuple( 1, 0, 42  ), 
        std::make_tuple( 0, 2,  3.4), 
        std::make_tuple( 0, 1,  1.7) 
    };    

    // sort by row-index, then column-index
    std::sort(begin(m), end(m), [](auto const& L, auto const& R) {
        return 
            std::forward_as_tuple(std::get<0>(L), std::get<1>(L)) <
            std::forward_as_tuple(std::get<0>(R), std::get<1>(R))
        ;
    });

    for (auto const& elem : m)
        std::cout << "{ " << std::get<0>(elem) << ", " << std::get<1>(elem) << ", " << std::get<2>(elem) << "}, \n";
}

Live Example.

If your application can use this transformed data layout (and there maybe cache-performance reasons why it can't), then the above code will do the sorting as you want it.

NOTE: as @Casey mentions, you can also use std::tie instead of std::forward_as_tuple, but that can bite you when you change sparse_entry into a full-fledged user-defined class with getters returning by-value.

like image 34
TemplateRex Avatar answered Nov 03 '22 13:11

TemplateRex