Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing a parser from scratch in Haskell

Is there any good tutorial for writing a parser for a given grammar in Haskell from scratch?

I found:

  1. parsing expressions and statements (HaskellWiki)

  2. Parsing a simple imperative language (HaskellWiki)

  3. Using parsec (Real World Haskell)

but all of these use the parsec library, and while that may be interesting for industrial applications I'm specifically looking for examples that are 'bottom up' without the use of sophisticated libraries.

The only one I found which uses 'basic' Haskell is this: Parsing with Haskell which uses some very foreign syntax (it's very hard to distinguish whats part of the program or whats only 'pseudocode') and there's no explicit grammar definition.

Any suggestions are highly appreciated!

like image 976
jules Avatar asked Dec 18 '13 14:12

jules


People also ask

Is Haskell good for parsing?

Haskell is an excellent language for all your parsing needs. The functional nature of the language makes it easy to compose different building blocks together without worrying about nasty side effects and unforeseen consequences.

What is generator parser?

A parser generator takes a grammar as input and automatically generates source code that can parse streams of characters using the grammar. The generated code is a parser, which takes a sequence of characters and tries to match the sequence against the grammar.

What is meant by parser?

A parser is a compiler or interpreter component that breaks data into smaller elements for easy translation into another language. A parser takes input in the form of a sequence of tokens, interactive commands, or program instructions and breaks them up into parts that can be used by other components in programming.


1 Answers

It's actually surprisingly easy to build Parsec-from-scratch. The actual library code itself is heavily generalized and optimized which contorts the core abstraction, but if you're just building things from scratch to understand more about what's going on you can write it in just a few lines of code. I'll build a slightly weaker Applicative parser below.

Essentially, we want to produce a Applicative, Parser along with a primitive parser value

satisfy :: (Char -> Bool) -> Parser Char 

and a few combinators such as try, which "undoes" a parser if it fails

try :: Parser a -> Parser a 

and orElse which allows us to continue with a second parser if the first parser fails. Usually this is actually written with the infix combinator (<|>)

orElse, (<|>) :: Parser a -> Parser a -> Parser a 

Since our Applicative needs to keep track of the current stream state and be able to fail, we'll build it by combining the state Applicative and the Either applicative.

type Error = String newtype Parser a = P { unP :: String -> (String, Either Error a) }  instance Functor Parser where   fmap f (P st) = P $ \stream -> case st stream of     (res, Left err) -> (res, Left err)     (res, Right a ) -> (res, Right (f a))  instance Applicative Parser where   pure a = P (\stream -> (stream, Right a))   P ff <*> P xx = P $ \stream0 -> case ff stream0 of   -- produce an f     (stream1, Left err) -> (stream1, Left err)     (stream1, Right f ) -> case xx stream1 of          -- produce an x       (stream2, Left err) -> (stream2, Left err)       (stream2, Right x ) -> (stream2, Right (f x))    -- return (f x) 

If we follow the (<*>) method in the Applicative instance carefully we see that it just passes the stream into the f-producing Parser, takes the result stream and, if successful, passes it to the x-producing Parser and if they both succeed it returns their application (f x). This means that if we have a function-producing parser and an argument-producing parser we can sequence them with (<*>)

-- given parseChar :: Char -> Parser Char  parseHi   :: Parser (Char, Char) -- parses 'h' then 'i' parseHi = pure (,) <$> parseChar 'h' <*> parseChar 'i' 

We can use the mechanics of this Applicative to build the needed combinators as well. Here's satisfy

-- | Peek at the next character and return successfully if it satisfies a predicate satisfy :: (Char -> Bool) -> Parser Char satisfy f = P $ \stream -> case stream of   []                 -> ([], Left "end of stream")   (c:cs) | f c       -> (cs, Right c)          | otherwise -> (cs, Left "did not satisfy") 

And here's try

-- | Run a parser but if it fails revert the stream to it's original state try :: Parser a -> Parser a try (P f) = P $ \stream0 -> case f stream0 of   (_      , Left err) -> (stream0, Left err)   (stream1, Right a ) -> (stream1, Right a ) 

And here's orElse

orElse :: Parser a -> Parser a -> Parser a orElse (P f1) (P f2) = P $ \stream0 -> case f1 stream0 of   (stream1, Left err) -> f2 stream1   (stream1, Right a ) -> (stream1, Right a) 

Typically at this point we also note that Parser forms an Alternative instance with orElse if we also provide an immediately failing parser, empty

instance Alternative Parser where   empty = P $ \stream -> (stream, Left "empty")   (<|>) = orElse    many = manyParser   some = someParser 

And we can write manyParser and someParser as combinators which run a parser repeatedly.

-- | 0 or more manyParser :: Parser a -> Parser [a] manyParser (P f) = P go where   go stream = case f stream of     (_      , Left err) -> (stream, Right [])  -- throws away the error     (stream', Right a ) -> case go stream' of        (streamFin, Left err) -> (streamFin, Left err)       (streamFin, Right as) -> (streamFin, Right (a : as))  -- | 1 or more someParser :: Parser a -> Parser [a] someParser (P f) = P $ \stream -> case f stream of   (stream', Left err) -> (stream', Left err)   (stream', Right a ) ->      let (P fmany) = manyParser (P f)     in case fmany stream' of       (stream'', Left err) -> (stream'', Left err)       (stream'', Right as) -> (stream'', Right (a:as)) 

And from here we can begin to work at much higher levels of abstraction.

char :: Char -> Parser Char char c = satisfy (== c)  string :: String -> Parser String string []     = pure [] string (c:cs) = (:) <$> char c <*> string cs  oneOf :: [Char] -> Parser Char oneOf cs = satisfy (`elem` cs)  parens :: Parser a -> Parser a parens parseA = dropFirstAndLast <$> char '(' <*> parseA <*> char ')'   where     dropFirstAndLast _ a _ = a 
like image 147
J. Abrahamson Avatar answered Oct 07 '22 02:10

J. Abrahamson