The function fromJust
in Data.Maybe
is defined in this way:
fromJust :: Maybe a -> a
fromJust Nothing = error "Maybe.fromJust: Nothing"
fromJust (Just x) = x
According to my understanding of pattern matching (matching proceeds from top to bottom) I would change the order of the two definitions. Since usually the Nothing-part doesn't match in a it-is-sure-a-Just situation, but it is always checked before the second definition is reached.
Can you please clarify my error in reasoning? Thanks.
Edit:
Example:
Suppose I have a file with a million numbers of type Int
per line and y need this numbers (as Int
, not String
) in my program for other stuff.
import qualified Data.ByteString.Lazy.Char8 as L
readInt = fst . fromJust . L.readInt
-- more stuff
With the above definition of fromJust
I would need more time to read the numbers, wouldn't I?
I think this question is about performance. While semantically pattern matching is tested top-to-bottom, most Haskell compilers will optimize matching on an ADT's constructors to the equivalent of a C switch
statement.
You can think of the data representation for an ADT has a "tag" which says which constructor it was made with, together with a pointer for each argument. So for example, Nothing
might be represented as 0 (null)
and Just 42
represented as 1 (pointer to 42)
.
Then in a function like this:
squash :: Maybe (Maybe Int) -> Int
squash (Just Nothing) = 0
squash Nothing = 0
squash (Just (Just x)) = x
The compiler would set up a decision tree:
squash x =
check tag of x:
0 -> 0
1 y -> check tag of y:
0 -> 0
1 z -> z
Where each tag is computed by a jump table or something, so it is no more expensive to check against 0
as 1
. Note that this same decision tree would be made no matter what the original order of the patterns in our definition.
However, when using guards instead of matching on a constructor, the patterns are most likely checked from top to bottom (the compiler would have to be very smart to optimize that). So if we wrote fromJust
in this arcane way:
fromJust x
| isNothing x = error "fromJust: Nothing"
| isJust x = case x of { Just y -> y }
Then this would probably check for each constructor in turn, and we could optimize by switching the order of the cases. Fortunately, writing in a way that this matters is cumbersome :-).
There are two important things to realize here: First, the Haskell compiler is fully aware that any value can only either be Nothing
or Just x
. Therefore it will only ever test for one of them. Second, GHC uses pointer tagging that allows it to distinguish between pointers pointing at various constructors very fast (details).
As a result, GHC generates quite nice code for the pattern match. Here's what the fromJust
return code looks like, taken directly from the assembly dump:
andq $7,%rax
cmpq $2,%rax
jae .LcO4
This takes the return value, then extracts the tag using a bit mask operation (7 = 111 in binary). If this tag is 2, then the code knows that it points to a Just x
constructor. Therefore it jumps to the appropriate code. Otherwise, it just continues to the code dealing with Nothing
.
I think this could even be optimized further using the knowledge that there is only ever one Nothing
closure in the program. Therefore you could replace that with a single address comparison, saving the bit-masking operations and even a register in the actual code. Not sure why GHC doesn't do this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With