I'm trying to understand how Haskell list comprehensions work "under the hood" in regards to pattern matching. The following ghci output illustrates my point:
Prelude> let myList = [Just 1, Just 2, Nothing, Just 3]
Prelude> let xs = [x | Just x <- myList]
Prelude> xs
[1,2,3]
Prelude>
As you can see, it is able to skip the "Nothing" and select only the "Just" values. I understand that List is a monad, defined as (source from Real World Haskell, ch. 14):
instance Monad [] where
return x = [x]
xs >>= f = concat (map f xs)
xs >> f = concat (map (\_ -> f) xs)
fail _ = []
Therefore, a list comprehension basically builds a singleton list for every element selected in the list comprehension and concatenates them. If a pattern match fails at some step, the result of the "fail" function is used instead. In other words, the "Just x" pattern doesn't match so [] is used as a placeholder until 'concat' is called. That explains why the "Nothing" appears to be skipped.
What I don't understand is, how does Haskell know to call the "fail" function? Is it "compiler magic", or functionality that you can write yourself in Haskell? Is it possible to write the following "select" function to work the same way as a list comprehension?
select :: (a -> b) -> [a] -> [b]
select (Just x -> x) myList -- how to prevent the lambda from raising an error?
[1,2,3]
While implemenatations of Haskell might not do it directly like this internally, it is helpful to think about it this way :)
[x | Just x <- myList]
... becomes:
do
Just x <- myList
return x
... which is:
myList >>= \(Just x) -> return x
As to your question:
What I don't understand is, how does Haskell know to call the "fail" function?
In do-notation, if a pattern binding fails (i.e. the Just x
), then the fail method is called. For the above example, it would look something like this:
myList >>= \temp -> case temp of
(Just x) -> return x
_ -> fail "..."
So, every time you have a pattern-match in a monadic context that may fail, Haskell inserts a call to fail
. Try it out with IO:
main = do
(1,x) <- return (0,2)
print x -- x would be 2, but the pattern match fails
The rule for desugaring a list comprehension requires an expression of the form [ e | p <- l ]
(where e
is an expression, p
a pattern, and l
a list expression) behave like
let ok p = [e]
ok _ = []
in concatMap ok l
Previous versions of Haskell had monad comprehensions, which were removed from the language because they were hard to read and redundant with the do
-notation. (List comprehensions are redundant, too, but they aren't so hard to read.) I think desugaring [ e | p <- l ]
as a monad (or, to be precise, as a monad with zero) would yield something like
let ok p = return e
ok _ = mzero
in l >>= ok
where mzero
is from the MonadPlus
class. This is very close to
do { p <- l; return e }
which desugars to
let ok p = return e
ok _ = fail "..."
in l >>= ok
When we take the List Monad, we have
return e = [e]
mzero = fail _ = []
(>>=) = flip concatMap
I.e., the 3 approaches (list comprehensions, monad comprehensions, do
expressions) are equivalent for lists.
I don't think the list comprehension syntax has much to do with the fact that List ([]
), or Maybe
for that matter, happens to be an instance of the Monad
type class.
List comprehensions are indeed compiler magic or syntax sugar, but that's possible because the compiler knows the structure of the []
data type.
Here's what the list comprehension is compiled to: (Well, I think, I didn't actually check it against the GHC)
xs = let f = \xs -> case xs of
Just x -> [x]
_ -> []
in concatMap f myList
As you can see, the compiler doesn't have to call the fail
function, it can simply inline a empty list, because it knows what a list is.
Interestingly, this fact that the list comprehensions syntax 'skips' pattern match failures is used in some libraries to do generic programming. See the example in the Uniplate library.
Edit: Oh, and to answer your question, you can't call your select
function with the lambda you gave it. It will indeed fail on a pattern match failure if you call it with an Nothing
value.
You could pass it the f
function from the code above, but than select
would have the type:
select :: (a -> [b]) -> [a] -> [b]
which is perfectly fine, you can use the concatMap
function internally :-)
Also, that new select
now has the type of the monadic bind operator for lists (with its arguments flipped):
(>>=) :: [a] -> (a -> [b]) -> [b]
xs >>= f = concatMap f xs -- 'or as you said: concat (map f xs)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With