I have been searching around the interwebs for a couple of days, trying to get an answer to my questions and i'm finally admitting defeat.
I have been given a grammar:
Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr
And i have been told to parse, evaluate and print expressions using this grammar
where the operators * + -
has their normal meaning
The specific task is to write a function parse :: String -> AST
that takes a string as input and returns an abstract syntax tree when the input is in the correct format (which i can asume it is).
I am told that i might need a suitable data type and that data type might need to derive from some other classes.
Following an example outputdata AST = Leaf Int | Sum AST AST | Min AST | ...
Further more, i should consider writing a functiontokens::String -> [String]
to split the input string into a list of tokens
Parsing should be accomplished withast::[String] -> (AST,[String])
where the input is a list of tokens and it outputs an AST, and to parse sub-expressions i should simply use the ast function recursively.
I should also make a printExpr method to print the result so thatprintE: AST -> String
printE(parse "* 5 5")
yields either "5*5"
or "(5*5)"
and also a function to evaluate the expressionevali :: AST -> Int
I would just like to be pointed in the right direction of where i might start. I have little knowledge of Haskell and FP in general and trying to solve this task i made some string handling function out of Java which made me realize that i'm way off track.
So a little pointer in the right direction, and maybe an explantion to 'how' the AST should look like
Third day in a row and still no running code, i really appreciate any attempt to help me find a solution!
Thanks in advance!
Edit
I might have been unclear: I'm wondering how i should go about from having read and tokenized an input string to making an AST.
OK, let's take your grammar
Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr
This is a nice easy grammar, because you can tell from the first token what sort of epression it will be.
(If there was something more complicated, like +
coming between numbers, or -
being used for subtraction as
well as negation, you'd need the list-of-successes trick, explained in
Functional Parsers.)
Let's have some sample raw input:
rawinput = "- 6 + 45 let x = - 5 in * x x"
Which I understand from the grammar represents "(- 6 (+ 45 (let x=-5 in (* x x))))"
,
and I'll assume you tokenised it as
tokenised_input' = ["-","6","+","4","5","let","x","=","-","5","in","*","x","x"]
which fits the grammar, but you might well have got
tokenised_input = ["-","6","+","45","let","x","=","-","5","in","*","x","x"]
which fits your sample AST
better. I think it's good practice to name your AST after bits of your grammar,
so I'm going to go ahead and replace
data AST = Leaf Int | Sum AST AST | Min AST | ...
with
data Expr = E_Int Int | E_Neg Expr | E_Sum Expr Expr | E_Prod Expr Expr | E_Var Char
| E_Let {letvar::Char,letequal:: Expr,letin::Expr}
deriving Show
I've named the bits of an E_Let
to make it clearer what they represent.
You could use isDigit
by adding import Data.Char (isDigit)
to help out:
expr :: [String] -> (Expr,[String])
expr [] = error "unexpected end of input"
expr (s:ss) | all isDigit s = (E_Int (read s),ss)
| s == "-" = let (e,ss') = expr ss in (E_Neg e,ss')
| s == "+" = (E_Sum e e',ss'') where
(e,ss') = expr ss
(e',ss'') = expr ss'
-- more cases
Yikes! Too many let clauses obscuring the meaning,
and we'll be writing the same code for E_Prod
and very much worse for E_Let
.
Let's get this sorted out!
The standard way of dealing with this is to write some combinators;
instead of tiresomely threading the input [String]
s through our definition, define ways to
mess with the output of parsers (map) and combine
multiple parsers into one (lift).
First we should define pmap
, our own equivalent of the map
function so we can do pmap E_Neg (expr1 ss)
instead of let (e,ss') = expr1 ss in (E_Neg e,ss')
pmap :: (a -> b) -> ([String] -> (a,[String])) -> ([String] -> (b,[String]))
nonono, I can't even read that! We need a type synonym:
type Parser a = [String] -> (a,[String])
pmap :: (a -> b) -> Parser a -> Parser b
pmap f p = \ss -> let (a,ss') = p ss
in (f a,ss')
But really this would be better if I did
data Parser a = Par [String] -> (a,[String])
so I could do
instance Functor Parser where
fmap f (Par p) = Par (pmap f p)
I'll leave that for you to figure out if you fancy.
We also need to deal with the situation when we have two parsers to run, and we want to combine their results using a function. This is called lifting the function to parsers.
liftP2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
liftP2 f p1 p2 = \ss0 -> let
(a,ss1) = p1 ss0
(b,ss2) = p2 ss1
in (f a b,ss2)
or maybe even three parsers:
liftP3 :: (a -> b -> c -> d) -> Parser a -> Parser b -> Parser c -> Parser d
I'll let you think how to do that.
In the let statement you'll need liftP5
to parse the sections of a let statement,
lifting a function that ignores the "="
and "in"
. You could make
equals_ :: Parser ()
equals_ [] = error "equals_: expected = but got end of input"
equals_ ("=":ss) = ((),ss)
equals_ (s:ss) = error $ "equals_: expected = but got "++s
and a couple more to help out with this.
Actually, pmap
could also be called liftP1
, but map is the traditional name for that sort of thing.
Now we're ready to clean up expr
:
expr :: [String] -> (Expr,[String])
expr [] = error "unexpected end of input"
expr (s:ss) | all isDigit s = (E_Int (read s),ss)
| s == "-" = pmap E_Neg expr ss
| s == "+" = liftP2 E_Sum expr expr ss
-- more cases
That'd all work fine. Really, it's OK. But liftP5
is going to be a bit long, and feels messy.
Applicative Functors is the way to go. Remember I suggested refactoring as
data Parser a = Par [String] -> (a,[String])
so you could make it an instance of Functor
? Perhaps you don't want to,
because all you've gained is a new name fmap
for the perfectly working pmap
and
you have to deal with all those Par
constructors cluttering up your code.
Perhaps this will make you reconsider, though; we can import Control.Applicative
,
then using the data
declaration, we can
define <*>
, which sort-of means then
and use <$>
instead of pmap
, with *>
meaning
<*>-but-forget-the-result-of-the-left-hand-side
so you would write
expr (s:ss) | s == "let" = E_Let <$> var *> equals_ <*> expr <*> in_ *> expr
Which looks a lot like your grammar definition, so it's easy to write code that works first time.
This is how I like to write Parsers. In fact, it's how I like to write an awful lot of things.
You'd only have to define fmap
, <*>
and pure
, all simple ones, and no long repetative liftP3
, liftP4
etc.
Read up about Applicative Functors. They're great.
Hints for making Parser applicative: pure
doesn't change the list.
<*>
is like liftP2
, but the function doesn't come from outside, it comes as the output from p1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With