Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ set with arbitrary comparator

Tags:

c++

I have the following C++ code

#include <set>
#include <string>
#include <iostream>
using namespace std;

class Pair {
  public:
    string lhs;
    string rhs;
    Pair();
    Pair( string l, string r ) {
      lhs=l;
      rhs=r;
    };
};

struct compare {
  bool operator()(const Pair& a, const Pair& b) const{
    if ( ( a.lhs == b.lhs && a.rhs == b.rhs ) || ( a.lhs == b.rhs && a.rhs == b.lhs ) ) {
      cout << "MATCH" << endl;
    }
    return ( a.lhs == b.lhs && a.rhs == b.rhs ) || ( a.lhs == b.rhs && a.rhs == b.lhs );
  }
};

int main () {
  set<Pair, compare > s;
  Pair p( string("Hello"), string("World") );
  s.insert(p);
  cout << s.size() << "\n";
  Pair q( string("World"), string("Hello") );
  s.insert(q);
  cout << s.size() << "\n";
  compare cmp;
  cout << cmp( p, q );

  return 0;
}

Invoking the compiled code gives:

1
MATCH
MATCH
2
MATCH

Somehow the set s ends up with both Pairs p, and q in spite of the fact that the comparator identifies them as identical. Why?

Any help will be much appreciated!

UPDATE:

Many thanks for the great answers and your kind and professional help. As you might have guessed already, I am quite a newby to C++.

Anyway, I was wondering, if Antoine's answer could be done with a lambda expression?

Something like:

std::set< …, [](){ my_comparator_code_here } > s;

????

like image 245
user3139868 Avatar asked Dec 15 '22 03:12

user3139868


2 Answers

The comparison operator for a std::set (which is an ordered container) needs to identify a strict weak ordering not any arbitrary test you wish. Normally a properly implemented operator< does the job.

If your comparison operator does not provide a strict weak ordered (as yours does not) the behavior will be undefined. There is no way to work around this requirement of the C++ standard.

Note that in certain cases where an equality comparison is needed it will have to use the operator< twice to make the comparison.

Also have you considered using std::pair<std::string, std::string> instead of rolling your own?

I've reread your question about five times now and I'm starting to wonder if what you want is a set of pairs where which string is in first and second doesn't matter as far as the comparison goes. In that case @Antoine has what appears to be the correct solution for you.

like image 115
Mark B Avatar answered Dec 27 '22 10:12

Mark B


A comparator for a set, map or any algorithm such as lower_bound or sort which require an order need to implement a strict weak ordering (basically, behave like a <).

Such an ordering is required to have 3 properties:

  • irreflexive: not (a < a) is always true
  • asymmetric: a < b implies not (b < a)
  • transitive: a < b and b < c imply a < c

Which you will not < has.

Such an ordering defines equivalence classes, which are groups of elements that compare equal according to the ordering (that is not (a < b) and not (b < a) is verified). In a set or map, only a single element per equivalence class can be inserted whereas a multiset or multimap may hold multiple elements per equivalence class.

Now, if you look at your comparator, you will realize that you have implemented == which does not define any order at all. You need to implement something akin to < instead.

A simple, but extremely efficient trick, is to use tuples which have < (and == and any other comparison operator) already implemented in a lexicographical order. Thus, std::tuple<std::string, std::string> has exactly the order you which; and even better, std::tuple<std::string const&, std::string const&> also has it, and can be constructed very easily using std::tie.

Therefore, the implementation of a straightforward comparator is as simple as:

struct comparator {
    bool operator()(Pair const& left, Pair const& right) const {
        return std::tie( left.a,  left.b)
             < std::tie(right.a, right.b);
    }
};

Note: although not discussed much, it is absolutely essential that the ordering of the comparator be stable across calls. As such, it should generally only depend on the values of the elements, and nothing external or runtime-related (such as their addresses in memory)


EDIT: as noted, your comparator is slightly more complicated.

In your case, though, you also need to take into account that a and b have a symmetric role. In general, I would suggest uniquifying the representation in the constructor of the object; if not possible, you can uniquify first and compare second:

struct comparator {
    bool operator()(Pair const& left, Pair const& right) const {
        auto uleft = left.a < left.b ? std::tie(left.a, left.b)
                                     : std::tie(left.b, left.a);
        auto uright = right.a < right.b ? std::tie(right.a, right.b)
                                        : std::tie(right.b, right.a);

        assert(get<0>(uleft) <= get<1>(uleft) and "Incorrect uleft");
        assert(get<0>(uright) <= get<1>(uright) and "Incorrect uright");

        return uleft < uright;
    }
}; // struct comparator
like image 43
Matthieu M. Avatar answered Dec 27 '22 12:12

Matthieu M.