The Haskell sortBy
function takes (a -> a -> Ordering)
as its first argument. Can anyone educate me as to what the reasoning is there? My background is entirely in languages that have a similar function take (a -> a -> Bool)
instead, so having to write one that returns LT
/GT
was a bit confusing.
Is this the standard way of doing it in statically typed/pure functional languages? Is this peculiar to ML-descended languages? Is there some fundamental advantage to it that I'm not seeing, or some hidden disadvantage to using booleans instead?
Summarizing:
An Ordering
is not GT | LT
, it's actually GT | EQ | LT
(apparently GHC
doesn't make use of this under the hood for the purposes of sorting, but still)
Returning a trichotomic value more closely models the possible outcomes of a comparison of two elements
In certain cases, using Ordering
rather than a Bool
will save a comparison
Using an Ordering
makes it easier to implement stable sorts
Using an Ordering
makes it clear to readers that a comparison between two elements is being done (a boolean doesn't inherently carry this meaning, though I get the feeling many readers will assume it)
I'm tentatively accepting Carl's answer, and posting the above summary since no single answer has hit all the points as of this edit.
I think Boolean Blindness is the main reason. Bool
is a type with no domain semantics. Its semantics in the case of a function like sortBy
come entirely from convention, not from the domain the function is operating on.
This adds one level of indirection to the mental process involved in writing a comparison function. Instead of just saying "the three values I can return are less than, equal, or greater", the semantic building blocks of ordering, you say "I want to return less than, so I must convert it to a boolean." There's an extra mental conversion step that's always present. Even if you are well-versed in the convention, it still slows you down a bit. And if you're not well-versed in the convention, you are slowed down quite a bit by having to check to see what it is.
The fact that it's 3-valued instead of 2-valued means you don't need to be quite as careful in your sort implementation to get stability, either - but that's a minor implementation detail. It's not nearly as important as actually having your values have meanings. (Also, Bool
is no more efficient than Ordering
. It's not a primitive in Haskell. They're both algebraic data types defined in libraries.)
When you sort things, you put them in order; there's not a "truth" value to determine.
More to the point, what would "true" mean? That the first argument is less than the second? Greater than? Now you're overriding "true" to really mean "less than" (or "greater than", depending on how you choose to implement the function). And what if they're equal?
So why not cut out the middle man, so to speak, and return what you really mean?
There's no reason it couldn't. If you look at the ghc implementation, it only checks whether the result is GT
or not. The Haskell Report version of the code uses insertBy
, which likewise only checks for GT
or not. You could write the following and use it without any problem:
sortByBool :: (a -> a -> Bool) -> [a] -> [a]
sortByBool lte = sortBy (\x y -> if lte x y then LT else GT)
sort' :: Ord a => [a] -> [a]
sort' = sortByBool (<=)
Some sorts could conceivably perform optimizations by knowing when elements are EQ
, but the implementations currently used do not need this information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With