Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting only using the less-than operator compared to a trivalue compare function

In C++/STL sorting is done by using only the less-than operator. Altough I have no idea how the sorting algorithms are actually implemented, I assume that the other operations are created implicite:

a > b *equals* b < a == true
a == b *equals* !(a < b) && !(b < a)

Compared to using a trivalue* compare function, like for example Java, is this good for performance, or why was this design decision made?

My assumption is that any trivalue compareto function still has to implement these comparissons in itself, resulting in the same performance.

**by trivalue compare function, I mean a compare function which returns -1, 0 and 1 for less than, equal and higher than*

Update: It seems a spaceship <=> operator are coming in C++20 so obviously the committee thought there were downsides of using only operator<.

like image 500
Viktor Sehr Avatar asked Feb 04 '23 05:02

Viktor Sehr


1 Answers

In a sense the other two are implicit, but more accurate would be to say that a comparison sort doesn't actually need a tri-valued comparator, and C++'s sorts are implemented in a way which doesn't use one in order to minimise the behaviour required of the comparator.

It would be wrong for std::sort to define and exclusively use something like this:

template <typename T, typename Cmp>
int get_tri_value(const T &a, const T &b, Cmp lessthan) {
    if (lessthan(a,b)) return -1;
    if (lessthan(b,a)) return 1;
    return 0;
}

... because you'd end up with an inefficient algorithm in terms of number of calls to lessthan. If your algorithm doesn't do anything useful with the difference between a 1 return and a 0 return, then you've wasted a comparison.

C++ refers to "strict weak orderings". If < is a strict weak ordering, and !(a < b) && !(b < a), it doesn't necessarily follow that a == b. They're just "in the same place" in the ordering, and !(a < b) && !(b < a) is an equivalence relation. So the comparator required by sort orders equivalence classes of objects, it doesn't provide a total order.

The only difference it makes is what you say when !(a < b). For a strict total order, you would deduce b <= a, read "less than or equal to". For a strict weak order, you can't define b <= a to mean b < a || b == a and have this be true. C++ is pedantic about this, and since it allows operator overloading it pretty much has to be, since people overloading operators need the jargon in order to tell users of their code what they can expect in terms of how the operators relate. Java does talk about the comparator and the hashCode being consistent with equals, which is all you need. C++ has to deal with <, >, ==, <=, >=, the post-condition of assignment, and so on.

C++ takes quite a pure mathematical approach to this in the API, so everything is defined in terms of the single binary relation. Java is friendlier in some respects, and prefers three-way comparisons where the definition of the fundamental unit (the comparison) is a bit more complex, but the logic leading from it is simpler. It also means the sort algorithm gets more information per comparison, which occasionally is useful. For an example, see the "Dutch flag" quicksort optimisation, which is a benefit when there are a lot of "in the same place" duplicates in the data.

In that case, a three-values comparator is a speed gain. But C++ uses a consistent definition of a comparator for sort and also for set and map, lower_bound and so on, which barely benefit from a three-value comparator (maybe save one comparison, maybe not). I'd guess they decided not to complicate their nice, general interface in the interests of specific or limited potential efficiency gains.

like image 199
Steve Jessop Avatar answered Feb 06 '23 15:02

Steve Jessop