Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort by proxy (or: sort one container by the contents of another) in C++

Tags:

c++

sorting

stl

I have a set of data which is split into two arrays (let's call them data and keys). That is, for any given item with an index i, I can access the data for that item with data[i] and the key for that item with keys[i]. I cannot change this structure (eg, to interleave keys and data into a single array), because I need to pass the data array to a library function which expects a certain data layout.

How can I sort both arrays (preferably using standard library functions) according to the content of the keys array?

like image 311
John Bartholomew Avatar asked Aug 03 '10 16:08

John Bartholomew


3 Answers

Here is a sample implementation which defines a new iterator type to provide a paired view over two sequences. I have tried to make it standards compliant and correct, but since the C++ standard is hideously complex in its details, I am almost certain that I have failed. I will say that this code appears to work when built with clang++ or g++.

This code is not recommended for general use, since it is longer and less understandable than the other answers, and possibly invokes the dreaded 'undefined behaviour'.

However, it does have the advantage of constant time and space overhead since it provides a view on existing data rather than actually building a temporary alternate representation or permutation vector. The most obvious (to me) performance problem with this code is that individual elements of the two containers will have to be copied during the swap operation. Despite several attempts, I have not found a way to successfully specialise std::swap such that std::sort or std::random_shuffle will avoid using the default temporary-copy based swap implementation. It is possible that use of the C++0x rvalue reference system (see std::move and Jon Purdy's answer) could solve this.

#ifndef HDR_PAIRED_ITERATOR
#define HDR_PAIRED_ITERATOR

#include <iterator>

/// pair_view mostly looks like a std::pair,
/// and can decay to a std::pair, but is really a pair of references
template <typename ItA, typename ItB>
struct pair_view {
    typedef typename ItA::value_type first_type;
    typedef typename ItB::value_type second_type;

    typedef std::pair<first_type, second_type> pair_type;

    pair_view() {}
    pair_view(const ItA &a, const ItB &b):
        first(*a), second(*b) {}

    pair_view &operator=(const pair_view &x)
        { first = x.first; second = x.second; return *this; }
    pair_view &operator=(const pair_type &x)
        { first = x.first; second = x.second; return *this; }

    typename ItA::reference first;
    typename ItB::reference second;
    operator pair_type() const
        { return std::make_pair(first, second); }

    friend bool operator==(const pair_view &a, const pair_view &b)
        { return (a.first == b.first) && (a.second == b.second); }
    friend bool operator<(const pair_view &a, const pair_view &b)
        { return (a.first < b.first) || ((a.first == b.first) && (a.second < b.second)); }
    friend bool operator!=(const pair_view &a, const pair_view &b)
        { return !(a == b); }
    friend bool operator>(const pair_view &a, const pair_view &b)
        { return (b < a); }
    friend bool operator<=(const pair_view &a, const pair_view &b)
        { return !(b < a); }
    friend bool operator>=(const pair_view &a, const pair_view &b)
        { return !(a < b); }

    friend bool operator==(const pair_view &a, const pair_type &b)
        { return (a.first == b.first) && (a.second == b.second); }
    friend bool operator<(const pair_view &a, const pair_type &b)
        { return (a.first < b.first) || ((a.first == b.first) && (a.second < b.second)); }
    friend bool operator!=(const pair_view &a, const pair_type &b)
        { return !(a == b); }
    friend bool operator>(const pair_view &a, const pair_type &b)
        { return (b < a); }
    friend bool operator<=(const pair_view &a, const pair_type &b)
        { return !(b < a); }
    friend bool operator>=(const pair_view &a, const pair_type &b)
        { return !(a < b); }

    friend bool operator==(const pair_type &a, const pair_type &b)
        { return (a.first == b.first) && (a.second == b.second); }
    friend bool operator<(const pair_type &a, const pair_type &b)
        { return (a.first < b.first) || ((a.first == b.first) && (a.second < b.second)); }
    friend bool operator!=(const pair_type &a, const pair_type &b)
        { return !(a == b); }
    friend bool operator>(const pair_type &a, const pair_type &b)
        { return (b < a); }
    friend bool operator<=(const pair_type &a, const pair_type &b)
        { return !(b < a); }
    friend bool operator>=(const pair_type &a, const pair_type &b)
        { return !(a < b); }
};

template <typename ItA, typename ItB>
struct paired_iterator {
    // --- standard iterator traits
    typedef typename pair_view<ItA, ItB>::pair_type value_type;
    typedef pair_view<ItA, ItB> reference;
    typedef paired_iterator<ItA,ItB> pointer;

    typedef typename std::iterator_traits<ItA>::difference_type difference_type;
    typedef std::random_access_iterator_tag iterator_category;

    // --- methods not required by the Random Access Iterator concept
    paired_iterator(const ItA &a, const ItB &b):
        a(a), b(b) {}

    // --- iterator requirements

    // default construction
    paired_iterator() {}

    // copy construction and assignment
    paired_iterator(const paired_iterator &x):
        a(x.a), b(x.b) {}
    paired_iterator &operator=(const paired_iterator &x)
        { a = x.a; b = x.b; return *this; }

    // pre- and post-increment
    paired_iterator &operator++()
        { ++a; ++b; return *this; }
    paired_iterator operator++(int)
        { paired_iterator tmp(*this); ++(*this); return tmp; }

    // pre- and post-decrement
    paired_iterator &operator--()
        { --a; --b; return *this; }
    paired_iterator operator--(int)
        { paired_iterator tmp(*this); --(*this); return tmp; }

    // arithmetic
    paired_iterator &operator+=(const difference_type &n)
        { a += n; b += n; return *this; }
    friend paired_iterator operator+(const paired_iterator &x, const difference_type &n)
        { return paired_iterator(x.a+n, x.b+n); }
    friend paired_iterator operator+(const difference_type &n, const paired_iterator &x)
        { return paired_iterator(x.a+n, x.b+n); }
    paired_iterator &operator-=(const difference_type &n)
        { a -= n; b -= n; return *this; }
    friend paired_iterator operator-(const paired_iterator &x, const difference_type &n)
        { return paired_iterator(x.a-n, x.b-n); }
    friend difference_type operator-(const paired_iterator &x, const paired_iterator &y)
        { return (x.a - y.a); }

    // (in-)equality and ordering
    friend bool operator==(const paired_iterator &x, const paired_iterator &y)
        { return (x.a == y.a) && (x.b == y.b); }
    friend bool operator<(const paired_iterator &x, const paired_iterator &y)
        { return (x.a < y.a); }

    // derived (in-)equality and ordering operators
    friend bool operator!=(const paired_iterator &x, const paired_iterator &y)
        { return !(x == y); }
    friend bool operator>(const paired_iterator &x, const paired_iterator &y)
        { return (y < x); }
    friend bool operator<=(const paired_iterator &x, const paired_iterator &y)
        { return !(y < x); }
    friend bool operator>=(const paired_iterator &x, const paired_iterator &y)
        { return !(x < y); }

    // dereferencing and random access
    reference operator*() const
        { return reference(a,b); }
    reference operator[](const difference_type &n) const
        { return reference(a+n, b+n); }
private:
    ItA a;
    ItB b;
};

template <typename ItA, typename ItB>
paired_iterator<ItA, ItB> make_paired_iterator(const ItA &a, const ItB &b)
{ return paired_iterator<ItA, ItB>(a, b); }

#endif


#include <vector>
#include <algorithm>
#include <iostream>

template <typename ItA, typename ItB>
void print_kvs(const ItA &k0, const ItB &v0, const ItA &kn, const ItB &vn) {
    ItA k(k0);
    ItB v(v0);
    while (k != kn || v != vn) {
        if (k != kn && v != vn)
            std::cout << "[" << *k << "] = " << *v << "\n";
        else if (k != kn)
            std::cout << "[" << *k << "]\n";
        else if (v != vn)
            std::cout << "[?] = " << *v << "\n";

        if (k != kn) ++k;
        if (v != vn) ++v;
    }
    std::cout << std::endl;
}

int main() {
    std::vector<int> keys;
    std::vector<std::string> data;

    keys.push_back(0); data.push_back("zero");
    keys.push_back(1); data.push_back("one");
    keys.push_back(2); data.push_back("two");
    keys.push_back(3); data.push_back("three");
    keys.push_back(4); data.push_back("four");
    keys.push_back(5); data.push_back("five");
    keys.push_back(6); data.push_back("six");
    keys.push_back(7); data.push_back("seven");
    keys.push_back(8); data.push_back("eight");
    keys.push_back(9); data.push_back("nine");

    print_kvs(keys.begin(), data.begin(), keys.end(), data.end());

    std::cout << "Shuffling\n";
    std::random_shuffle(
        make_paired_iterator(keys.begin(), data.begin()),
        make_paired_iterator(keys.end(), data.end())
    );

    print_kvs(keys.begin(), data.begin(), keys.end(), data.end());

    std::cout << "Sorting\n";
    std::sort(
        make_paired_iterator(keys.begin(), data.begin()),
        make_paired_iterator(keys.end(), data.end())
    );

    print_kvs(keys.begin(), data.begin(), keys.end(), data.end());

    std::cout << "Sort descending\n";
    std::sort(
        make_paired_iterator(keys.begin(), data.begin()),
        make_paired_iterator(keys.end(), data.end()),
        std::greater< std::pair<int,std::string> >()
    );

    print_kvs(keys.begin(), data.begin(), keys.end(), data.end());

    return 0;
}
like image 30
John Bartholomew Avatar answered Oct 11 '22 14:10

John Bartholomew


Create a vector of objects that contain indices to the two arrays. Define operator< for that object to do the comparison based on keys[index]. Sort that vector. When you're done, walk through that vector and put your original objects into the order defined by those proxy objects:

// warning: untested code.
struct sort_proxy { 
    size_t i;

    bool operator<(sort_proxy const &other) const { 
        return keys[i] < keys[other.i];
    }
};

// ... key_size = number of keys/data items.
std::vector<sort_proxy> proxies(key_size); 

for (i=0; i<keys_size; i++)
    proxies[i].i = i;

std::sort(proxies.begin(), proxies.end());

// in-place reordering left as an exercise for the reader. :-)
for (int i=0; i<proxies[i].size(); i++) {
    result_keys[i] = keys[proxies[i].i];
    result_data[i] = data[proxies[i].i];
}
like image 180
Jerry Coffin Avatar answered Oct 11 '22 16:10

Jerry Coffin


You could use a map:

int main() {
  vector<int> keys;
  vector<string> data;
  keys.push_back(5); data.push_back("joe");
  keys.push_back(2); data.push_back("yaochun");
  keys.push_back(1); data.push_back("holio");

  // load the keys and data to the map (they will automatically be inserted in sorted order by key)
  map<int, string> sortedVals;
  for(int i = 0; i < (int)keys.size(); ++i) {
    sortedVals[keys[i]] = data[i];
  }

  // copy the map values back to vectors  
  int ndx=0;
  for(map<int, string>::iterator it = sortedVals.begin(); it != sortedVals.end(); ++it) {
    keys[ndx] = it->first;
    data[ndx] = it->second;
    ++ndx;
  }

  // verify
  for(int i = 0; i < (int)keys.size(); ++i) {
    cout<<keys[i]<<" "<<data[i]<<endl;
  }

  return 0;
}

Here's the output:

---------- Capture Output ----------
> "c:\windows\system32\cmd.exe" /c c:\temp\temp.exe
1 holio
2 yaochun
5 joe

> Terminated with exit code 0.
like image 22
dcp Avatar answered Oct 11 '22 15:10

dcp