Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting in JavaScript: Shouldn't returning a boolean be enough for a comparison function?

I have always successfully sorted my arrays like this (when I did not want the standard lexicographic ordering):

var arr = […] // some numbers or so
arr.sort(function(a, b) {
    return a > b;
});

Now, someone told me this was wrong, and that I would need to return a-b instead. Is that true, and if yes why? I have tested my comparison function, and it works! Also, why would my solution be so common when it is wrong?

like image 583
Bergi Avatar asked Jun 06 '14 11:06

Bergi


People also ask

Can a function return a Boolean value JavaScript?

You can return a boolean value from a JavaScript function. Create a function and use the if statement to evaluate the given value to the function. And return true or false according to the conditions.

What is sorting in JavaScript?

function(a, b){return a - b} When the sort() function compares two values, it sends the values to the compare function, and sorts the values according to the returned (negative, zero, positive) value. If the result is negative a is sorted before b . If the result is positive b is sorted before a .

How does sort compare work?

Then, when you call sort(), with your custom compare function, the compare function is called on pairs in your to-be-sorted list, to determine the proper ordering. Simply subtracting b from a will always return greater than zero if a is larger than b, 0 if they are equal, or less than zero if a is less than b.


2 Answers

TL;DR

I have always successfully sorted my arrays like this

No, you have not. And didn't notice it. Some quick counterexamples:

> [0,1,0].sort(function(a,b){ return a>b })
Array [0, 1, 0] // in Chrome, in Internet Exploder 11.
// Results may vary between sorting algorithm implementations
> [1,1,0,2].sort(function(a, b){ return a>b })
Array [0, 1, 2, 1] // in Opera 12.
Array [1, 1, 0, 2] // in Chrome 100, in Internet Exploder 11.
// Results will vary between sorting algorithm implementations

why?

Because your comparison function does return false (or 0, equivalently) even when b is larger than a. But 0 implies that the two elements are considered equal - and the sorting algorithm believes that.

In-Depth Explanation

Comparison functions in JavaScript

How do comparison functions work?

The Array::sort method can take an optional, custom comparison function as its argument. That function takes two arguments (commonly referred to as a and b) which it should compare, and is supposed to return a number

  • > 0 when a is considered larger than b and should be sorted after it
  • == 0 when a is considered equal to b and it doesn't matter which comes first
  • < 0 when a is considered smaller than b and should be sorted before it

If it does not return a number, the result will be cast to a number (which is handy for booleans). The returned number does not need to be exactly -1 or 0 or 1 (though it typically is).

Consistent ordering

To be consistent, the compare function would need to fulfill the equation

comp(a, b) == -1 * comp(b, a)
// or, if values other than -1, 0 and 1 are considered:
comp(a, b) * comp(b, a) <= 0

If that requirement is broken, the sort will behave undefined.

Citing the ES5.1 spec on sort (same thing in the ES6 spec):

If comparefn is […] not a consistent comparison function for the elements of this array, the behaviour of sort is implementation-defined.

A function comparefn is a consistent comparison function for a set of values S if all of the requirements below are met for all values a, b, and c (possibly the same value) in the set S: The notation a <CF b means comparefn(a,b) < 0; a =CF b means comparefn(a,b) = 0 (of either sign); and a >CF b means comparefn(a,b) > 0.

Calling comparefn(a,b) always returns the same value v when given a specific pair of values a and b as its two arguments. Furthermore, Type(v) is Number, and v is not NaN. Note that this implies that exactly one of a <CF b, a =CF b, and a >CF b will be true for a given pair of a and b.

  • Calling comparefn(a,b) does not modify the this object.
  • a =CF a (reflexivity)
  • If a =CF b, then b =CF a (symmetry)
  • If a =CF b and b =CF c, then a =CF c (transitivity of =CF)
  • If a <CF b and b <CF c, then a <CF c (transitivity of <CF)
  • If a >CF b and b >CF c, then a >CF c (transitivity of >CF)

NOTE: The above conditions are necessary and sufficient to ensure that comparefn divides the set S into equivalence classes and that these equivalence classes are totally ordered.

Uh, what does this mean? Why should I care?

A sorting algorithm needs to compare items of the array with each other. To do a good and efficient job, it must not need to compare each item to every other, but needs to be able to reason about their ordering. For that to work well, there are a few rules that a custom comparison function needs to abide. A trivial one is that an item a is equal to itself (compare(a, a) == 0) - that's the first item in the list above (reflexivity). Yes, this is a bit mathematical, but pays of well.

The most important one is transitivity. It says that when the algorithm has compared two values a and b, and also b with c, and has found out by applying the comparison function that e.g. a = b and b < c, then it can expect that a < c holds as well. This seems only logical, and is required for a well-defined, consistent ordering.

But your comparison function does fail this. Lets look at this example:

 function compare(a, b) { return Number(a > b); }
 compare(0, 2) == 0 // ah, 2 and 0 are equal
 compare(1, 0) == 1 // ah, 1 is larger than 0
 // let's conclude: 1 is also larger than 2

Ooops. And that is why a sorting algorithm can fail (in the spec, this is be "implementation-dependent behaviour" - i.e. unpredictable results) when it is invoked with a comparison function that is not consistent.

Why is the wrong solution so common?

Because in many other languages, there are sorting algorithms that don't expect a three-way comparison but merely a boolean smaller-than operator. C++ std::sort is a good example of that. It will simply be applied twice with swapped arguments if an equality needs to be determined. Admittedly, this can be more efficient and is less error-prone, but needs more calls to the comparison function if the operator cannot be inlined.

CounterExamples

I have tested my comparison function, and it works!

Only by sheer luck, if you tried some random example. Or because your test suite is flawed - incorrect and/or incomplete.

Here is the small script I used to find the above minimal counterexample:

function perms(n, i, arr, cb) {
// calls callback with all possible arrays of length n
    if (i >= n) return cb(arr);
    for (var j=0; j<n; j++) {
        arr[i] = j;
        perms(n, i+1, arr, cb);
    }
}
for (var i=2; ; i++) // infinite loop
    perms(i, 0, [], function(a) {
        if (    a.slice().sort(function(a,b){ return a>b }).toString()
             != a.slice().sort(function(a,b){ return a-b }).toString() )
            // you can also console.log() all of them, but remove the loop!
            throw a.toString();
    });

What comparison function is correct?

Use no comparison function at all, when you want a lexicographic sort. Items in the array will be stringified if necessary.

A generic comparison function that works like the relational operators can be implemented as

function(a, b) {
    if (a > b) return 1;
    if (a < b) return -1;
    /* else */ return 0;
}

With a few tricks, this can be minified to the equivalent function(a,b){return +(a>b)||-(a<b)}.

For numbers, you can simply return their difference, which abides all the laws above:

function(a, b) {
    return a - b; // but make sure only numbers are passed (to avoid NaN)
}

If you want to sort in reverse, just take the appropriate one and swap a with b.

If you want to sort composite types (objects etc), replace each a and each b with an access of the properties in question, or a method call or whatever you want to sort by.

like image 58
Bergi Avatar answered Oct 25 '22 00:10

Bergi


The sort function expects a function that expects two arguments a and b, and returns:

  • A negative number if a comes before b
  • A positive number if a comes after b
  • Zero if relative order of a and b does not matter

In order to sort numbers in ascending order return a - b will produce the correct return values; for example:

a    b    ret
1    2    -1
3    2     1
2    2     0

On the other hand return a > b produces the following return values:

a    b    ret      implied
1    2    false    0
3    2    true     1
2    2    false    0

In the above example, the sort function is told that 1 and 2 are same (and placing 1 before 2 or 2 before 1 does not matter). This will produce incorrect result, for example (in Chrome 49):

console.log([5, 8, 7, 1, 2, 3, 4, 6, 9, 10, 11, 12, 13].sort(function(a, b) {
    return a > b;
}));
// [4, 5, 3, 1, 2, 6, 7, 8, 9, 10, 11, 12, 13]
like image 44
Salman A Avatar answered Oct 25 '22 00:10

Salman A