Error detection and reporting using Maybe

Question

I am writing a propositional logic parser in Haskell. I am doing the parsing by hand for now as a learning exercise. Eventually I will tackle Parsec. In the mean time, I am trying to wrap my head around Monads. In particular, I am using Maybe to report errors from my parse function. My current trouble is with part of a helper function:

parse' :: String -> (Maybe Wff, String)
parse' ('[':rest) = (x, if null rest''
                        then ""
                        else tail rest'')
        where (a, rest') = parse' rest
              (b, rest'') = parse' (if null rest'
                                    then ""
                                    else tail rest')
              x = if null rest'
                     || null rest''
                     || head rest' /= '|'
                     || head rest'' /= ']'
                  then Nothing
                  else Or <$> a <*> b

(For reference, the full parse function can be found here.)

This code parses a proposition of the form [ A | B ] where A and B are any arbitrary propositions. As you can see, I am using applicative style on the last line to propagate the Nothing result if a previous recursive call results in a Nothing. This allowed me to take out a == Nothing and b == Nothing from the if condition. How can I use the Applicative or Monad instance of Maybe to colapse the rest of this if?

Gabriella Gonzalez · Accepted Answer

I will actually do the problem backwards: I will begin from the monadic solution and work backwards from it to the hand-rolled solution. This will produce the same code that you would get if you arrived at the correct solution by hand.

The typical type signature of a monadic parser is of the form:

type Parser a = String -> Maybe (a, String)

Notice the slight difference with your form, where you have the final String on the outside of the Maybe. Both are valid, but I prefer this form more because I consider the leftovers String invalid if the parse failed.

This type is actually a special case of StateT, which is defined as:

newtype StateT s m a = StateT { runStateT :: s -> m (a, s) }

Notice that if we choose:

s = String
m = Maybe

... we get back the Parser type:

type Parser a = StateT String Maybe a

-- or: type Parser = StateT String Maybe

What's cool about this is that we only need to define one parser by hand, which is the parser that retrieves a single character:

anyChar :: Parser Char
anyChar = StateT $ \str -> case str of
    []   -> Nothing
    c:cs -> Just (c, cs)

Notice that if we removed the StateT wrapper, the type of anyChar would be:

anyChar :: String -> Maybe (Char, String)

When we wrap it in StateT it becomes:

anyChar :: StateT String Maybe Char

... which is just Parser Char.

Once we have that primitive parser, we can define all the other parsers using StateT's Monad interface. For example, let's define a parser that matches a single character:

import Control.Monad

char :: Char -> Parser ()
char c' = do
    c <- anyChar
    guard (c == c')

That was easy! guard requires a MonadPlus instance for our monad, but we already have one. The reason why is thanks to the following two MonadPlus instances:

instance (MonadPlus m) => MonadPlus (StateT s m) where ...

instance MonadPlus Maybe where ...

The combination of these two instances means that StateT s Maybe automatically implements MonadPlus, so we can use guard and it will just magically do "the right thing".

With those two parser in hand, your final parser becomes very easy to write:

data Wff = Or Char Char deriving (Show)

parseWff :: Parser Wff
parseWff = do
    char '['
    a <- anyChar
    char '|'
    b <- anyChar
    char ']'
    return (Or a b)

That's much clearer and easier to understand what is going on. It also works:

>>> runStateT parseWff "[A|B]"
Just (Or 'A' 'B',"")

Working Backwards

This brings us to your original question: How do you hand-write the same behavior? We will work backwards from the Monad and MonadPlus instances to deduce what they are doing under the hood for us.

To do this, we must deduce what the Monad and MonadPlus instances for StateT reduce to when its base monad is Maybe. Let's begin from the Monad instance for StateT:

instance (Monad m) => Monad (StateT s m) where
    return r = StateT (\s -> return (r, s))

    m >>= f  = StateT $ \s0 -> do
        (a, s1) <- runStateT m s0
        runStateT (f a) s1

Notice that it is defined in generically terms of the base monad. This means we also need the Monad instance for Maybe to derive what the above code does:

instance Monad Maybe where
    return  = Just

    m >>= f = case m of
        Nothing -> Nothing
        Just a  -> f a

If we substitute the Maybe monad instance into the StateT monad instance we get:

instance Monad (StateT s Maybe) where
    return r = StateT (\s -> Just (r, s))

    m >>= f  = StateT $ \s0 -> case runStateT m s0 of
        Nothing      -> Nothing
        Just (a, s1) -> runStateT (f a) s1

We can do the same thing to derive the Monad instance for StateT s Maybe. We just have to take the MonadPlus instances for StateT and `Maybe:

instance (MonadPlus m) => MonadPlus (StateT s m) where
    mzero = StateT (\_ -> mzero)
    mplus (StateT f) (StateT g) = StateT (\s -> mplus (f s) (g s))

instance MonadPlus Maybe where
    mzero = Nothing
    mplus m1 m2 = case m1 of
        Just a  -> Just a
        Nothing -> case m2 of
            Just b  -> Just b
            Nothing -> Nothing

... and combine them into one:

instance MonadPlus (StateT s Maybe) where
    mzero = StateT (\_ -> Nothing)
    mplus (StateT f) (StateT g) = StateT $ \s -> case f s of
        Just a  -> Just a
        Nothing -> case g s of
            Just b  -> Just b
            Nothing -> Nothing

Now we can derive what our parsers are doing under the hood. Let's begin with the char parser:

char c' = do
    c <- anyChar
    guard (c == c')

This desugars to:

char c' = anyChar >>= \c -> guard (c == c')

We just derived what (>>=) does for StateT s Maybe, so lets substitute that in:

char c' = StateT $ \str0 -> case runStateT anyChar str0 of
        Nothing      -> Nothing
        Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1

We already know the definition of anyChar, so let's substitute that in:

char c' = StateT $ \str0 -> case runStateT (StateT $ \str -> case str of
        []   -> Nothing
        c:cs -> Just (c, cs) ) str0 of
    Nothing -> Nothing
    Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1

We also know that runStateT is the inverse of StateT, so:

char c' = StateT $ \str0 -> case (\str -> case str of
        []   -> Nothing
        c:cs -> Just (c, cs) ) str0 of
    Nothing -> Nothing
    Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1

We can then apply the lambda to str0:

char c' = StateT $ \str0 -> case (case str0 of
        []   -> Nothing
        c:cs -> Just (c, cs) ) of
    Nothing -> Nothing
    Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1

Now we distribute the outer case statement over the inner case statement:

char c' = StateT $ \str0 -> case str0 of
    []   -> case Nothing of
        Nothing -> Nothing
        Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1
    c:cs -> case Just (c, cs) of
        Nothing -> Nothing
        Just (a, str1) -> runStateT ((\c -> guard (c == c')) a) str1

... and evaluate the case statements:

char c' = StateT $ \str0 -> case str0 of
    []   -> Nothing
    c:cs -> runStateT ((\c -> guard (c == c')) c) cs

Then we can apply the lambda to c:

char c' = StateT $ \str0 -> case str0 of
    []   -> Nothing
    c:cs -> runStateT (guard (c == c')) cs

To simplify that further, we need to know what guard does. Here is the source code for it:

guard pred = if pred then return () else mzero

We already know what return and mzero for StateT s Maybe are, so let's substitute them in:

guard pred = if pred then StateT (\s -> Just ((), s)) else StateT (\_ -> Nothing)

Now we can inline that into our function:

char c' = StateT $ \str0 -> case str0 of
    []   -> Nothing
    c:cs -> runStateT (if (c == c')
        then StateT (\s -> Just ((), s))
        else StateT (\_ -> Nothing) ) cs

If we distribute runStateT over that we get:

char c' = StateT $ \str0 -> case str0 of
    []   -> Nothing
    c:cs -> (if (c == c')
        then (\s -> Just ((), s))
        else (\_ -> Nothing) ) cs

Similarly, we can apply both branches to cs:

char c' = StateT $ \str0 -> case str0 of
    []   -> Nothing
    c:cs -> if (c == c') then Just ((), cs)  else Nothing

That's the equivalent code we would have written by hand had we not used the Monad or MonadPlus instances at all.

The Final Parser

I will now repeat this process for the last function, but leave the derivation as an exercise for you:

parseWff = do
    char '['
    a <- anyChar
    char '|'
    b <- anyChar
    char ']'
    return (Or a b)

parseWff = StateT $ \str0 -> case str0 of
    []     -> Nothing
    c1:str1 -> if (c1 == '[')
        then case str1 of
            []      -> Nothing
            c2:str2 -> case str2 of
                []      -> Nothing
                c3:str3 -> if (c3 == '|')
                    then case str3 of
                        []      -> Nothing
                        c4:str4 -> case str4 of
                            []      -> Nothing
                            c5:str5 -> if (c5 == ']')
                                then Just (Or c2 c4, str5)
                                else Nothing
                    else Nothing
        else Nothing

... but we can further simplify that to:

parseWff = StateT $ \str0 -> case str0 of
    '[':c2:'|':c4:']':str5 -> Just (Or c2 c4, str5)
    _                      -> Nothing

Notice that, unlike the function you wrote, this doesn't use any partial functions like tail or incomplete pattern matches. Also, the code you wrote doesn't compile, but even if it did, it would still give the wrong behavior.

Tikhon Jelvis · Answer

You can use a function from Control.Monad called guard. This has a slightly odd type:

guard :: MonadPlus m => Bool -> m ()

MonadPlus covers all monads that have some "empty" case. For lists, this is []; for Maybe it is Nothing. guard takes a boolean; if it is False, it evaluates to this empty value; otherwise it evaluates to return (). This behavior is mostly useful in do notation:

x = do guard (not $ null rest' || null rest'' || head rest' /= '|' || head rest'' /= ']')
       Or <$> a <*> b

What happens here is simple. If the condition evaluates to True, guard returns Just (), which is then ignored in favor of Or <$> a <*> b (since that's how do notation works). However, if the condition is False, guard returns Nothing, which propagates through the rest of the do notation to give you an end result of Nothing: exactly what you wanted.

To make the code more readable, I would also pull the condition out into its own variable in a where block.

Error detection and reporting using Maybe

Tags:

parsing

text-parsing

haskell

Code-Apprentice

2 Answers

Working Backwards

The Final Parser

Gabriella Gonzalez

Tikhon Jelvis

Recent Activity

Donate For Us

Error detection and reporting using Maybe

Tags:

parsing

text-parsing

haskell

Code-Apprentice

2 Answers

Working Backwards

The Final Parser

Gabriella Gonzalez

Tikhon Jelvis

Related questions

Recent Activity

Donate For Us