Is there any good tutorial for writing a parser for a given grammar in Haskell from scratch?
I found:
parsing expressions and statements (HaskellWiki)
Parsing a simple imperative language (HaskellWiki)
Using parsec (Real World Haskell)
but all of these use the parsec library, and while that may be interesting for industrial applications I'm specifically looking for examples that are 'bottom up' without the use of sophisticated libraries.
The only one I found which uses 'basic' Haskell is this: Parsing with Haskell which uses some very foreign syntax (it's very hard to distinguish whats part of the program or whats only 'pseudocode') and there's no explicit grammar definition.
Any suggestions are highly appreciated!
Haskell is an excellent language for all your parsing needs. The functional nature of the language makes it easy to compose different building blocks together without worrying about nasty side effects and unforeseen consequences.
A parser generator takes a grammar as input and automatically generates source code that can parse streams of characters using the grammar. The generated code is a parser, which takes a sequence of characters and tries to match the sequence against the grammar.
A parser is a compiler or interpreter component that breaks data into smaller elements for easy translation into another language. A parser takes input in the form of a sequence of tokens, interactive commands, or program instructions and breaks them up into parts that can be used by other components in programming.
It's actually surprisingly easy to build Parsec-from-scratch. The actual library code itself is heavily generalized and optimized which contorts the core abstraction, but if you're just building things from scratch to understand more about what's going on you can write it in just a few lines of code. I'll build a slightly weaker Applicative
parser below.
Essentially, we want to produce a Applicative
, Parser
along with a primitive parser value
satisfy :: (Char -> Bool) -> Parser Char
and a few combinators such as try
, which "undoes" a parser if it fails
try :: Parser a -> Parser a
and orElse
which allows us to continue with a second parser if the first parser fails. Usually this is actually written with the infix combinator (<|>)
orElse, (<|>) :: Parser a -> Parser a -> Parser a
Since our Applicative
needs to keep track of the current stream state and be able to fail, we'll build it by combining the state Applicative
and the Either
applicative.
type Error = String newtype Parser a = P { unP :: String -> (String, Either Error a) } instance Functor Parser where fmap f (P st) = P $ \stream -> case st stream of (res, Left err) -> (res, Left err) (res, Right a ) -> (res, Right (f a)) instance Applicative Parser where pure a = P (\stream -> (stream, Right a)) P ff <*> P xx = P $ \stream0 -> case ff stream0 of -- produce an f (stream1, Left err) -> (stream1, Left err) (stream1, Right f ) -> case xx stream1 of -- produce an x (stream2, Left err) -> (stream2, Left err) (stream2, Right x ) -> (stream2, Right (f x)) -- return (f x)
If we follow the (<*>)
method in the Applicative
instance carefully we see that it just passes the stream into the f
-producing Parser
, takes the result stream and, if successful, passes it to the x
-producing Parser
and if they both succeed it returns their application (f x)
. This means that if we have a function-producing parser and an argument-producing parser we can sequence them with (<*>)
-- given parseChar :: Char -> Parser Char parseHi :: Parser (Char, Char) -- parses 'h' then 'i' parseHi = pure (,) <$> parseChar 'h' <*> parseChar 'i'
We can use the mechanics of this Applicative
to build the needed combinators as well. Here's satisfy
-- | Peek at the next character and return successfully if it satisfies a predicate satisfy :: (Char -> Bool) -> Parser Char satisfy f = P $ \stream -> case stream of [] -> ([], Left "end of stream") (c:cs) | f c -> (cs, Right c) | otherwise -> (cs, Left "did not satisfy")
And here's try
-- | Run a parser but if it fails revert the stream to it's original state try :: Parser a -> Parser a try (P f) = P $ \stream0 -> case f stream0 of (_ , Left err) -> (stream0, Left err) (stream1, Right a ) -> (stream1, Right a )
And here's orElse
orElse :: Parser a -> Parser a -> Parser a orElse (P f1) (P f2) = P $ \stream0 -> case f1 stream0 of (stream1, Left err) -> f2 stream1 (stream1, Right a ) -> (stream1, Right a)
Typically at this point we also note that Parser
forms an Alternative
instance with orElse
if we also provide an immediately failing parser, empty
instance Alternative Parser where empty = P $ \stream -> (stream, Left "empty") (<|>) = orElse many = manyParser some = someParser
And we can write manyParser
and someParser
as combinators which run a parser repeatedly.
-- | 0 or more manyParser :: Parser a -> Parser [a] manyParser (P f) = P go where go stream = case f stream of (_ , Left err) -> (stream, Right []) -- throws away the error (stream', Right a ) -> case go stream' of (streamFin, Left err) -> (streamFin, Left err) (streamFin, Right as) -> (streamFin, Right (a : as)) -- | 1 or more someParser :: Parser a -> Parser [a] someParser (P f) = P $ \stream -> case f stream of (stream', Left err) -> (stream', Left err) (stream', Right a ) -> let (P fmany) = manyParser (P f) in case fmany stream' of (stream'', Left err) -> (stream'', Left err) (stream'', Right as) -> (stream'', Right (a:as))
And from here we can begin to work at much higher levels of abstraction.
char :: Char -> Parser Char char c = satisfy (== c) string :: String -> Parser String string [] = pure [] string (c:cs) = (:) <$> char c <*> string cs oneOf :: [Char] -> Parser Char oneOf cs = satisfy (`elem` cs) parens :: Parser a -> Parser a parens parseA = dropFirstAndLast <$> char '(' <*> parseA <*> char ')' where dropFirstAndLast _ a _ = a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With