I am writing a propositional logic parser in Haskell. I am doing the parsing by hand for now as a learning exercise. Eventually I will tackle Parsec. In the mean time, I am trying to wrap my head around Monads. In particular, I am using Maybe
to report errors from my parse
function. My current trouble is with part of a helper function:
parse' :: String -> (Maybe Wff, String)
parse' ('[':rest) = (x, if null rest''
then ""
else tail rest'')
where (a, rest') = parse' rest
(b, rest'') = parse' (if null rest'
then ""
else tail rest')
x = if null rest'
|| null rest''
|| head rest' /= '|'
|| head rest'' /= ']'
then Nothing
else Or <$> a <*> b
(For reference, the full parse
function can be found here.)
This code parses a proposition of the form [ A | B ]
where A
and B
are any arbitrary propositions. As you can see, I am using applicative style on the last line to propagate the Nothing
result if a previous recursive call results in a Nothing
. This allowed me to take out a == Nothing
and b == Nothing
from the if
condition. How can I use the Applicative
or Monad
instance of Maybe
to colapse the rest of this if
?
I will actually do the problem backwards: I will begin from the monadic solution and work backwards from it to the hand-rolled solution. This will produce the same code that you would get if you arrived at the correct solution by hand.
The typical type signature of a monadic parser is of the form:
type Parser a = String -> Maybe (a, String)
Notice the slight difference with your form, where you have the final String
on the outside of the Maybe
. Both are valid, but I prefer this form more because I consider the leftovers String
invalid if the parse failed.
This type is actually a special case of StateT
, which is defined as:
newtype StateT s m a = StateT { runStateT :: s -> m (a, s) }
Notice that if we choose:
s = String
m = Maybe
... we get back the Parser
type:
type Parser a = StateT String Maybe a
-- or: type Parser = StateT String Maybe
What's cool about this is that we only need to define one parser by hand, which is the parser that retrieves a single character:
anyChar :: Parser Char
anyChar = StateT $ \str -> case str of
[] -> Nothing
c:cs -> Just (c, cs)
Notice that if we removed the StateT
wrapper, the type of anyChar
would be:
anyChar :: String -> Maybe (Char, String)
When we wrap it in StateT
it becomes:
anyChar :: StateT String Maybe Char
... which is just Parser Char
.
Once we have that primitive parser, we can define all the other parsers using StateT
's Monad
interface. For example, let's define a parser that matches a single character:
import Control.Monad
char :: Char -> Parser ()
char c' = do
c <- anyChar
guard (c == c')
That was easy! guard
requires a MonadPlus
instance for our monad, but we already have one. The reason why is thanks to the following two MonadPlus
instances:
instance (MonadPlus m) => MonadPlus (StateT s m) where ...
instance MonadPlus Maybe where ...
The combination of these two instances means that StateT s Maybe
automatically implements MonadPlus
, so we can use guard
and it will just magically do "the right thing".
With those two parser in hand, your final parser becomes very easy to write:
data Wff = Or Char Char deriving (Show)
parseWff :: Parser Wff
parseWff = do
char '['
a <- anyChar
char '|'
b <- anyChar
char ']'
return (Or a b)
That's much clearer and easier to understand what is going on. It also works:
>>> runStateT parseWff "[A|B]"
Just (Or 'A' 'B',"")
This brings us to your original question: How do you hand-write the same behavior? We will work backwards from the Monad
and MonadPlus
instances to deduce what they are doing under the hood for us.
To do this, we must deduce what the Monad
and MonadPlus
instances for StateT
reduce to when its base monad is Maybe
. Let's begin from the Monad
instance for StateT
:
instance (Monad m) => Monad (StateT s m) where
return r = StateT (\s -> return (r, s))
m >>= f = StateT $ \s0 -> do
(a, s1) <- runStateT m s0
runStateT (f a) s1
Notice that it is defined in generically terms of the base monad. This means we also need the Monad
instance for Maybe
to derive what the above code does:
instance Monad Maybe where
return = Just
m >>= f = case m of
Nothing -> Nothing
Just a -> f a
If we substitute the Maybe
monad instance into the StateT
monad instance we get:
instance Monad (StateT s Maybe) where
return r = StateT (\s -> Just (r, s))
m >>= f = StateT $ \s0 -> case runStateT m s0 of
Nothing -> Nothing
Just (a, s1) -> runStateT (f a) s1
We can do the same thing to derive the Monad
instance for StateT s Maybe
. We just have to take the MonadPlus
instances for StateT
and `Maybe:
instance (MonadPlus m) => MonadPlus (StateT s m) where
mzero = StateT (\_ -> mzero)
mplus (StateT f) (StateT g) = StateT (\s -> mplus (f s) (g s))
instance MonadPlus Maybe where
mzero = Nothing
mplus m1 m2 = case m1 of
Just a -> Just a
Nothing -> case m2 of
Just b -> Just b
Nothing -> Nothing
... and combine them into one:
instance MonadPlus (StateT s Maybe) where
mzero = StateT (\_ -> Nothing)
mplus (StateT f) (StateT g) = StateT $ \s -> case f s of
Just a -> Just a
Nothing -> case g s of
Just b -> Just b
Nothing -> Nothing
Now we can derive what our parsers are doing under the hood. Let's begin with the char
parser:
char c' = do
c <- anyChar
guard (c == c')
This desugars to:
char c' = anyChar >>= \c -> guard (c == c')
We just derived what (>>=)
does for StateT s Maybe
, so lets substitute that in:
char c' = StateT $ \str0 -> case runStateT anyChar str0 of
Nothing -> Nothing
Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
We already know the definition of anyChar
, so let's substitute that in:
char c' = StateT $ \str0 -> case runStateT (StateT $ \str -> case str of
[] -> Nothing
c:cs -> Just (c, cs) ) str0 of
Nothing -> Nothing
Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
We also know that runStateT
is the inverse of StateT
, so:
char c' = StateT $ \str0 -> case (\str -> case str of
[] -> Nothing
c:cs -> Just (c, cs) ) str0 of
Nothing -> Nothing
Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
We can then apply the lambda to str0
:
char c' = StateT $ \str0 -> case (case str0 of
[] -> Nothing
c:cs -> Just (c, cs) ) of
Nothing -> Nothing
Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
Now we distribute the outer case statement over the inner case statement:
char c' = StateT $ \str0 -> case str0 of
[] -> case Nothing of
Nothing -> Nothing
Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
c:cs -> case Just (c, cs) of
Nothing -> Nothing
Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
... and evaluate the case statements:
char c' = StateT $ \str0 -> case str0 of
[] -> Nothing
c:cs -> runStateT ((\c -> guard (c == c')) c) cs
Then we can apply the lambda to c
:
char c' = StateT $ \str0 -> case str0 of
[] -> Nothing
c:cs -> runStateT (guard (c == c')) cs
To simplify that further, we need to know what guard
does. Here is the source code for it:
guard pred = if pred then return () else mzero
We already know what return
and mzero
for StateT s Maybe
are, so let's substitute them in:
guard pred = if pred then StateT (\s -> Just ((), s)) else StateT (\_ -> Nothing)
Now we can inline that into our function:
char c' = StateT $ \str0 -> case str0 of
[] -> Nothing
c:cs -> runStateT (if (c == c')
then StateT (\s -> Just ((), s))
else StateT (\_ -> Nothing) ) cs
If we distribute runStateT
over that we get:
char c' = StateT $ \str0 -> case str0 of
[] -> Nothing
c:cs -> (if (c == c')
then (\s -> Just ((), s))
else (\_ -> Nothing) ) cs
Similarly, we can apply both branches to cs
:
char c' = StateT $ \str0 -> case str0 of
[] -> Nothing
c:cs -> if (c == c') then Just ((), cs) else Nothing
That's the equivalent code we would have written by hand had we not used the Monad
or MonadPlus
instances at all.
I will now repeat this process for the last function, but leave the derivation as an exercise for you:
parseWff = do
char '['
a <- anyChar
char '|'
b <- anyChar
char ']'
return (Or a b)
parseWff = StateT $ \str0 -> case str0 of
[] -> Nothing
c1:str1 -> if (c1 == '[')
then case str1 of
[] -> Nothing
c2:str2 -> case str2 of
[] -> Nothing
c3:str3 -> if (c3 == '|')
then case str3 of
[] -> Nothing
c4:str4 -> case str4 of
[] -> Nothing
c5:str5 -> if (c5 == ']')
then Just (Or c2 c4, str5)
else Nothing
else Nothing
else Nothing
... but we can further simplify that to:
parseWff = StateT $ \str0 -> case str0 of
'[':c2:'|':c4:']':str5 -> Just (Or c2 c4, str5)
_ -> Nothing
Notice that, unlike the function you wrote, this doesn't use any partial functions like tail
or incomplete pattern matches. Also, the code you wrote doesn't compile, but even if it did, it would still give the wrong behavior.
You can use a function from Control.Monad
called guard
. This has a slightly odd type:
guard :: MonadPlus m => Bool -> m ()
MonadPlus
covers all monads that have some "empty" case. For lists, this is []
; for Maybe
it is Nothing
. guard
takes a boolean; if it is False
, it evaluates to this empty value; otherwise it evaluates to return ()
. This behavior is mostly useful in do
notation:
x = do guard (not $ null rest' || null rest'' || head rest' /= '|' || head rest'' /= ']')
Or <$> a <*> b
What happens here is simple. If the condition evaluates to True
, guard returns Just ()
, which is then ignored in favor of Or <$> a <*> b
(since that's how do
notation works). However, if the condition is False
, guard
returns Nothing
, which propagates through the rest of the do
notation to give you an end result of Nothing
: exactly what you wanted.
To make the code more readable, I would also pull the condition out into its own variable in a where
block.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With