Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't sortBy take (a -> a -> Bool)?

The Haskell sortBy function takes (a -> a -> Ordering) as its first argument. Can anyone educate me as to what the reasoning is there? My background is entirely in languages that have a similar function take (a -> a -> Bool) instead, so having to write one that returns LT/GT was a bit confusing.

Is this the standard way of doing it in statically typed/pure functional languages? Is this peculiar to ML-descended languages? Is there some fundamental advantage to it that I'm not seeing, or some hidden disadvantage to using booleans instead?


Summarizing:

  • An Ordering is not GT | LT, it's actually GT | EQ | LT (apparently GHC doesn't make use of this under the hood for the purposes of sorting, but still)

  • Returning a trichotomic value more closely models the possible outcomes of a comparison of two elements

  • In certain cases, using Ordering rather than a Bool will save a comparison

  • Using an Ordering makes it easier to implement stable sorts

  • Using an Ordering makes it clear to readers that a comparison between two elements is being done (a boolean doesn't inherently carry this meaning, though I get the feeling many readers will assume it)

I'm tentatively accepting Carl's answer, and posting the above summary since no single answer has hit all the points as of this edit.

like image 742
Inaimathi Avatar asked Aug 28 '12 03:08

Inaimathi


3 Answers

I think Boolean Blindness is the main reason. Bool is a type with no domain semantics. Its semantics in the case of a function like sortBy come entirely from convention, not from the domain the function is operating on.

This adds one level of indirection to the mental process involved in writing a comparison function. Instead of just saying "the three values I can return are less than, equal, or greater", the semantic building blocks of ordering, you say "I want to return less than, so I must convert it to a boolean." There's an extra mental conversion step that's always present. Even if you are well-versed in the convention, it still slows you down a bit. And if you're not well-versed in the convention, you are slowed down quite a bit by having to check to see what it is.

The fact that it's 3-valued instead of 2-valued means you don't need to be quite as careful in your sort implementation to get stability, either - but that's a minor implementation detail. It's not nearly as important as actually having your values have meanings. (Also, Bool is no more efficient than Ordering. It's not a primitive in Haskell. They're both algebraic data types defined in libraries.)

like image 93
Carl Avatar answered Nov 02 '22 07:11

Carl


When you sort things, you put them in order; there's not a "truth" value to determine.

More to the point, what would "true" mean? That the first argument is less than the second? Greater than? Now you're overriding "true" to really mean "less than" (or "greater than", depending on how you choose to implement the function). And what if they're equal?

So why not cut out the middle man, so to speak, and return what you really mean?

like image 20
mipadi Avatar answered Nov 02 '22 08:11

mipadi


There's no reason it couldn't. If you look at the ghc implementation, it only checks whether the result is GT or not. The Haskell Report version of the code uses insertBy, which likewise only checks for GT or not. You could write the following and use it without any problem:

sortByBool :: (a -> a -> Bool) -> [a] -> [a]
sortByBool lte = sortBy (\x y -> if lte x y then LT else GT)

sort' :: Ord a => [a] -> [a]
sort' = sortByBool (<=)

Some sorts could conceivably perform optimizations by knowing when elements are EQ, but the implementations currently used do not need this information.

like image 11
Dan Burton Avatar answered Nov 02 '22 07:11

Dan Burton