I'm relatively new to Haskell with main programming background coming from OO languages. I am trying to write an interpreter with a parser for a simple programming language. So far I have the interpreter at a state which I am reasonably happy with, but am struggling slightly with the parser.
Here is the piece of code which I am having problems with
data IntExp
= IVar Var
| ICon Int
| Add IntExp IntExp
deriving (Read, Show)
whitespace = many1 (char ' ')
parseICon :: Parser IntExp
parseICon =
do x <- many (digit)
return (ICon (read x :: Int))
parseIVar :: Parser IntExp
parseIVar =
do x <- many (letter)
prime <- string "'" <|> string ""
return (IVar (x ++ prime))
parseIntExp :: Parser IntExp
parseIntExp =
do x <- try(parseICon)<|>try(parseIVar)<|>parseAdd
return x
parseAdd :: Parser IntExp
parseAdd =
do x <- parseIntExp
whitespace
string "+"
whitespace
y <- parseIntExp
return (Add x y)
runP :: Show a => Parser a -> String -> IO ()
runP p input
= case parse p "" input of
Left err ->
do putStr "parse error at "
print err
Right x -> print x
The language is slightly more complex, but this is enough to show my problem.
So in the type IntExp ICon is a constant and IVar is a variable, but now onto the problem. This for example runs successfully
runP parseAdd "5 + 5"
which gives (Add (ICon 5) (ICon 5)), which is the expected result. The problem arises when using IVars rather than ICons eg
runP parseAdd "n + m"
This causes the program to error out saying there was an unexpected "n" where a digit was expected. This leads me to believe that parseIntExp isn't working as I intended. My intention was that it will try to parse an ICon, if that fails then try to parse an IVar and so on.
So I either think the problem exists in parseIntExp, or that I am missing something in parseIVar and parseICon.
I hope I've given enough info about my problem and I was clear enough.
Thanks for any help you can give me!
In the case of programming languages, a parser is a component of a compiler or interpreter, which parses the source code of a computer programming language to create some form of internal representation; the parser is a key step in the compiler frontend.
Haskell is an excellent language for all your parsing needs. The functional nature of the language makes it easy to compose different building blocks together without worrying about nasty side effects and unforeseen consequences.
Haskell, with its support for pattern matching on data structures, generic structure traversals, and expressive type system, is popular for implementing compilers and interpreters. Here's a selection of compilers and interpreters implemented in Haskell.
In a functional language such as Haskell, parsers can naturally be viewed as functions. type Parser = String Tree. A parser is a function that takes a string and returns some form of tree. Page 2.
Your problem is actually in parseICon
:
parseICon =
do x <- many (digit)
return (ICon (read x :: Int))
The many
combinator matches zero or more occurrences, so it's succeeding on "m" by matching zero digits, then probably dying when read
fails.
And while I'm at it, since you're new to Haskell, here's some unsolicited advice:
Don't use spurious parentheses. many (digit)
should just be many digit
. Parentheses here just group things, they're not necessary for function application.
You don't need to do ICon (read x :: Int)
. The data constructor ICon
can only take an Int
, so the compiler can figure out what you meant on its own.
You don't need try
around the first two options in parseIntExp
as it stands--there's no input that would result in either one consuming some input before failing. They'll either fail immediately (which doesn't need try
) or they'll succeed after matching a single character.
It's usually a better idea to tokenize first before parsing. Dealing with whitespace at the same time as syntax is a headache.
It's common in Haskell to use the ($)
operator to avoid parentheses. It's just function application, but with very low precedence, so that something like many1 (char ' ')
can be written many1 $ char ' '
.
Also, doing this sort of thing is redundant and unnecessary:
parseICon :: Parser IntExp
parseICon =
do x <- many digit
return (ICon (read x))
When all you're doing is applying a regular function to the result of a parser, you can just use fmap
:
parseICon :: Parser IntExp
parseICon = fmap (ICon . read) (many digit)
They're the exact same thing. You can make things look even nicer if you import the Control.Applicative
module, which gives you an operator version of fmap
, called (<$>)
, as well as another operator (<*>)
that lets you do the same thing with functions of multiple arguments. There's also operators (<*)
and (*>)
that discard the right or left values, respectively, which in this case lets you parse something while discarding the result, e.g., whitespace and such.
Here's a lightly modified version of your code with some of the above suggestions applied and some other minor stylistic tweaks:
whitespace = many1 $ char ' '
parseICon :: Parser IntExp
parseICon = ICon . read <$> many1 digit
parseIVar :: Parser IntExp
parseIVar = IVar <$> parseVarName
parseVarName :: Parser String
parseVarName = (++) <$> many1 letter <*> parsePrime
parsePrime :: Parser String
parsePrime = option "" $ string "'"
parseIntExp :: Parser IntExp
parseIntExp = parseICon <|> parseIVar <|> parseAdd
parsePlusWithSpaces :: Parser ()
parsePlusWithSpaces = whitespace *> string "+" *> whitespace *> pure ()
parseAdd :: Parser IntExp
parseAdd = Add <$> parseIntExp <* parsePlusWithSpaces <*> parseIntExp
I'm also new to Haskell, just wondering:
will parseIntExp ever make it to parseAdd?
It seems like ICon or IVar will always get parsed before reaching 'parseAdd'.
e.g. runP parseIntExp "3 + m"
would try parseICon, and succeed, giving
(ICon 3) instead of (Add (ICon 3) (IVar m))
Sorry if I'm being stupid here, I'm just unsure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With