Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pattern matching in definition of fromJust

The function fromJust in Data.Maybe is defined in this way:

fromJust          :: Maybe a -> a
fromJust Nothing  = error "Maybe.fromJust: Nothing"
fromJust (Just x) = x

According to my understanding of pattern matching (matching proceeds from top to bottom) I would change the order of the two definitions. Since usually the Nothing-part doesn't match in a it-is-sure-a-Just situation, but it is always checked before the second definition is reached.

Can you please clarify my error in reasoning? Thanks.

Edit:

Example: Suppose I have a file with a million numbers of type Int per line and y need this numbers (as Int, not String) in my program for other stuff.

import qualified Data.ByteString.Lazy.Char8 as L

readInt = fst . fromJust . L.readInt
-- more stuff

With the above definition of fromJust I would need more time to read the numbers, wouldn't I?

like image 290
Jogusa Avatar asked Jun 08 '11 13:06

Jogusa


2 Answers

I think this question is about performance. While semantically pattern matching is tested top-to-bottom, most Haskell compilers will optimize matching on an ADT's constructors to the equivalent of a C switch statement.

You can think of the data representation for an ADT has a "tag" which says which constructor it was made with, together with a pointer for each argument. So for example, Nothing might be represented as 0 (null) and Just 42 represented as 1 (pointer to 42).

Then in a function like this:

squash :: Maybe (Maybe Int) -> Int
squash (Just Nothing) = 0
squash Nothing = 0
squash (Just (Just x)) = x

The compiler would set up a decision tree:

squash x = 
   check tag of x:
       0 -> 0
       1 y -> check tag of y:
           0 -> 0
           1 z -> z

Where each tag is computed by a jump table or something, so it is no more expensive to check against 0 as 1. Note that this same decision tree would be made no matter what the original order of the patterns in our definition.

However, when using guards instead of matching on a constructor, the patterns are most likely checked from top to bottom (the compiler would have to be very smart to optimize that). So if we wrote fromJust in this arcane way:

fromJust x
    | isNothing x = error "fromJust: Nothing"
    | isJust x = case x of { Just y -> y }

Then this would probably check for each constructor in turn, and we could optimize by switching the order of the cases. Fortunately, writing in a way that this matters is cumbersome :-).

like image 53
luqui Avatar answered Sep 19 '22 14:09

luqui


There are two important things to realize here: First, the Haskell compiler is fully aware that any value can only either be Nothing or Just x. Therefore it will only ever test for one of them. Second, GHC uses pointer tagging that allows it to distinguish between pointers pointing at various constructors very fast (details).

As a result, GHC generates quite nice code for the pattern match. Here's what the fromJust return code looks like, taken directly from the assembly dump:

        andq $7,%rax
        cmpq $2,%rax
        jae .LcO4

This takes the return value, then extracts the tag using a bit mask operation (7 = 111 in binary). If this tag is 2, then the code knows that it points to a Just x constructor. Therefore it jumps to the appropriate code. Otherwise, it just continues to the code dealing with Nothing.

I think this could even be optimized further using the knowledge that there is only ever one Nothing closure in the program. Therefore you could replace that with a single address comparison, saving the bit-masking operations and even a register in the actual code. Not sure why GHC doesn't do this.

like image 41
Peter Wortmann Avatar answered Sep 19 '22 14:09

Peter Wortmann