Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why must std::sort compare function return false when arguments are equal?

Tags:

c++

sorting

std

In std::sort you can supply a third argument which is the basis for how it sorts a list. If you want the first argument to come first, then you return true. If you want the second argument to come first you return false. I have come across the problem that my predicate function supposedly is an "invalid comparator", and I have narrowed it down to the fact that it does not fulfil the following requirement:

if arg1 == arg2, compare function MUST return false.

There have been some terms I have seen such as that std::sort requires "strict weak ordering". Apart from 2 places, all the other pages I get about these topics seem to be technical papers, and I can't understand it. What I can understand of it is that:

In weak ordering some elements "may be tied" with each other.

But to me this is also the meaning of a "partially ordered set", which is:

"there may be pairs of elements for which neither element precedes the other"

Furthermore, I can't understand what the "strict" implies in either of them.

Leaving aside my confusion about order theory terminology, my question is if in the compare function argument 1 and argument 2 are equal, and in which case I don't care which comes before the other (either one coming before would make me happy), why can't I return true to have argument 1 come first?

I was also going to ask how my program actually knows it's an invalid comparator but then I thought it probably just checks to see if arg1 and arg2 are equal when the compare function returns true.

like image 346
Zebrafish Avatar asked Aug 29 '17 01:08

Zebrafish


3 Answers

The compare function simply models a "less than" operator. Consider how the < operator works for primitive types like int:

int a = 1, b = 2;     a < b == true      a is less than b
int a = 2, b = 1;     a < b == false     a is not less than b, because a is greater than b
int a = 1, b = 1;     a < b == false     a is not less than b, because a equals b

Returning true means you want a to be ordered before b. So return false if that is not the case, either because you want b to be ordered before a, or because their order doesn't matter.

If you return true when the arguments are equal, then you are saying that you want a to come before b and you want b to come before a, which is a contradiction.

like image 155
Oktalist Avatar answered Nov 14 '22 14:11

Oktalist


The algorithm which std::sort uses is unspecified. By using a comparison function which does not meet the requirements set by the standard, you break the algorithm's assumptions, and cause undefined behavior.

Look what happens in the output of this noisy comparison function which uses <= (which is not a valid comparator) instead of < (which is valid).

http://coliru.stacked-crooked.com/a/34e4cef3280f3728

In the output, I am printing an incrementing variable (for reference to point out when the algorithm goes haywire), followed by the value of the first argument and its position in the vector, and then the second argument and its position in the vector. An example looks like this:

124: 2@12 <= 4@7

Which means this is the 124th invocation of the comparison function, and it is comparing the value 2 at index 12, with the value 4 at index 7.

Things go crazy starting at line 37

37: 0@0 <= 0@-1
38: 0@0 <= 144@-2
39: 0@0 <= 0@-3
40: 0@0 <= 192@-4

It is comparing values that I didn't even insert into the vector (144, 192, etc...) and at indexes outside the range of the vector (negative indexes, in this case).

like image 6
Benjamin Lindley Avatar answered Nov 14 '22 12:11

Benjamin Lindley


1. From a code perspective

When elements count is greater than _S_threshold (in STL, default value is 16), std::sort() will use quicksort.

The following code is __unguarded_partition function in quicksort.

  /// This is a helper function...
  template<typename _RandomAccessIterator, typename _Tp, typename _Compare>
    _RandomAccessIterator
    __unguarded_partition(_RandomAccessIterator __first,
             _RandomAccessIterator __last,
             const _Tp& __pivot, _Compare __comp) 
   {
      while (true)
      {
        while (__comp(*__first, __pivot))
          ++__first;
        --__last;
        while (__comp(__pivot, *__last))
          --__last;
        if (!(__first < __last))
          return __first;
        std::iter_swap(__first, __last);
        ++__first;
      }
    }

if arg1 == arg2, compare function return true, then __first iterator will keep moving to the right, and the program will go out of range and cause coredump.

while (__comp(*__first, __pivot))
    ++__first;

so, if arg1 == arg2, compare function MUST return false.

2. From a algorithm logic perspective

If comparison function uses operator<=, then it returns true for equal values. If test to see whether 10B is equivalent to 10A, it naturally uses the comparison function. In this example, that's operator<=. Tt checks to see whether this expression is true:

!(10A<= 10B)&&!(10B<= 10A) //test 10Aand 10B for equivalence

Well, 10A and 10B are both 10, so it's clearly true that 10A <= 10B. Equally clearly, 10B <= 10A. The above expression thus simplifies to

!(true)&&!(true)

and that simplifies to

false && false

which is simply false. That is, the test concludes that 10A and 10B are not equivalent, hence not the same. Furthermore, any comparison function where equal values return true will do the same thing. Equal values are, by definition, not equivalent! Isn't that cool?

You also can see << Effective STL>>, Item21: Always have comparison functions return false for equal values.

like image 6
Terry Avatar answered Nov 14 '22 12:11

Terry