Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing in Haskell for a simple interpreter

I'm relatively new to Haskell with main programming background coming from OO languages. I am trying to write an interpreter with a parser for a simple programming language. So far I have the interpreter at a state which I am reasonably happy with, but am struggling slightly with the parser.

Here is the piece of code which I am having problems with

data IntExp
 = IVar Var
 | ICon Int
 | Add IntExp IntExp
 deriving (Read, Show)

whitespace = many1 (char ' ')

parseICon :: Parser IntExp
parseICon =
  do x <- many (digit)
     return (ICon (read x :: Int))

parseIVar :: Parser IntExp
parseIVar = 
  do x <- many (letter)
     prime <- string "'" <|> string ""
     return (IVar (x ++ prime))

parseIntExp :: Parser IntExp
parseIntExp =
  do x <- try(parseICon)<|>try(parseIVar)<|>parseAdd
     return x

parseAdd :: Parser IntExp
parseAdd =
  do x <- parseIntExp   
     whitespace
     string "+"
     whitespace
     y <- parseIntExp
     return (Add x y)

runP :: Show a => Parser a -> String -> IO ()
runP p input
  = case parse p "" input of
      Left err ->
        do putStr "parse error at "
           print err
      Right x -> print x

The language is slightly more complex, but this is enough to show my problem.

So in the type IntExp ICon is a constant and IVar is a variable, but now onto the problem. This for example runs successfully

runP parseAdd "5 + 5"

which gives (Add (ICon 5) (ICon 5)), which is the expected result. The problem arises when using IVars rather than ICons eg

runP parseAdd "n + m"

This causes the program to error out saying there was an unexpected "n" where a digit was expected. This leads me to believe that parseIntExp isn't working as I intended. My intention was that it will try to parse an ICon, if that fails then try to parse an IVar and so on.

So I either think the problem exists in parseIntExp, or that I am missing something in parseIVar and parseICon.

I hope I've given enough info about my problem and I was clear enough.

Thanks for any help you can give me!

like image 493
Josh Avatar asked Nov 06 '10 01:11

Josh


People also ask

What is parsing in interpreter?

In the case of programming languages, a parser is a component of a compiler or interpreter, which parses the source code of a computer programming language to create some form of internal representation; the parser is a key step in the compiler frontend.

Is Haskell good for parsing?

Haskell is an excellent language for all your parsing needs. The functional nature of the language makes it easy to compose different building blocks together without worrying about nasty side effects and unforeseen consequences.

Does Haskell use interpreter?

Haskell, with its support for pattern matching on data structures, generic structure traversals, and expressive type system, is popular for implementing compilers and interpreters. Here's a selection of compilers and interpreters implemented in Haskell.

What is a parser in Haskell?

In a functional language such as Haskell, parsers can naturally be viewed as functions. type Parser = String Tree. A parser is a function that takes a string and returns some form of tree. Page 2.


2 Answers

Your problem is actually in parseICon:

parseICon =
  do x <- many (digit)
     return (ICon (read x :: Int))

The many combinator matches zero or more occurrences, so it's succeeding on "m" by matching zero digits, then probably dying when read fails.


And while I'm at it, since you're new to Haskell, here's some unsolicited advice:

  • Don't use spurious parentheses. many (digit) should just be many digit. Parentheses here just group things, they're not necessary for function application.

  • You don't need to do ICon (read x :: Int). The data constructor ICon can only take an Int, so the compiler can figure out what you meant on its own.

  • You don't need try around the first two options in parseIntExp as it stands--there's no input that would result in either one consuming some input before failing. They'll either fail immediately (which doesn't need try) or they'll succeed after matching a single character.

  • It's usually a better idea to tokenize first before parsing. Dealing with whitespace at the same time as syntax is a headache.

  • It's common in Haskell to use the ($) operator to avoid parentheses. It's just function application, but with very low precedence, so that something like many1 (char ' ') can be written many1 $ char ' '.

Also, doing this sort of thing is redundant and unnecessary:

parseICon :: Parser IntExp
parseICon =
  do x <- many digit
     return (ICon (read x))

When all you're doing is applying a regular function to the result of a parser, you can just use fmap:

parseICon :: Parser IntExp
parseICon = fmap (ICon . read) (many digit)

They're the exact same thing. You can make things look even nicer if you import the Control.Applicative module, which gives you an operator version of fmap, called (<$>), as well as another operator (<*>) that lets you do the same thing with functions of multiple arguments. There's also operators (<*) and (*>) that discard the right or left values, respectively, which in this case lets you parse something while discarding the result, e.g., whitespace and such.

Here's a lightly modified version of your code with some of the above suggestions applied and some other minor stylistic tweaks:

whitespace = many1 $ char ' '

parseICon :: Parser IntExp
parseICon = ICon . read <$> many1 digit

parseIVar :: Parser IntExp
parseIVar = IVar <$> parseVarName

parseVarName :: Parser String
parseVarName = (++) <$> many1 letter <*> parsePrime

parsePrime :: Parser String
parsePrime = option "" $ string "'"

parseIntExp :: Parser IntExp
parseIntExp = parseICon <|> parseIVar <|> parseAdd

parsePlusWithSpaces :: Parser ()
parsePlusWithSpaces = whitespace *> string "+" *> whitespace *> pure ()

parseAdd :: Parser IntExp
parseAdd = Add <$> parseIntExp <* parsePlusWithSpaces <*> parseIntExp
like image 177
C. A. McCann Avatar answered Sep 22 '22 06:09

C. A. McCann


I'm also new to Haskell, just wondering:

will parseIntExp ever make it to parseAdd?

It seems like ICon or IVar will always get parsed before reaching 'parseAdd'.

e.g. runP parseIntExp "3 + m"

would try parseICon, and succeed, giving

(ICon 3) instead of (Add (ICon 3) (IVar m))

Sorry if I'm being stupid here, I'm just unsure.

like image 41
tre Avatar answered Sep 22 '22 06:09

tre