Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AST and Parsing in Haskell

I have an assignment and I can't figure out how to define the answer.

The assignment

Write the function exp:: [String] -> (AST, [String])

AST:

  • If x is a number, it should say Number x.
  • If it is a "+" og a "-" it should say Atom x.
  • If it reads a "(", then all the contents behind the "(" until it comes a ")" should be a List [AST].

so that the outputs will be:

exp (token "(hi (4) 32)")
> (List [Atom "hi", List [Number 4], Number 32], [])

exp (token "(+ 3 42 654 2)") 
> (List [Atom "+", Number 3, Number 42, Number 654, Number 2], [])

exp (token "(+ 21 444) junk") 
> (List [Atom "+", Number 21, Number 444], ["junk"])

what I have so far

I already have the token-function, token :: String -> [String] which makes a list.

Example:
`token "( + 2 ( + 2 3 ) )"
> ["(","+","2","(","+","2","3",")",")"]`

The exp function looks like this:

exp :: [String] -> (AST, [String])
exp [] = error "Empty list"
exp (x:xs)  | x == ")"      = error ""
            | x == "("      = let (e, ss') = exp xs in (List [getAst xs], ss')
            | x == "+"      = let (e, ss') = exp xs in (Atom (read x), ss')
            | x == "-"      = let (e, ss') = exp xs in (Atom (read x), ss')
            | otherwise     = exp xs`

where the getAst function:

getAst :: [String] -> AST
getAst [] = error ""
getAst (x:xs)
            | x == ")"  = error ""
            | x == "("  = (List [getAst xs])
            | isAtom x  = (Atom x) 
            | isNum x   = (Number (read x))
            | otherwise = getAst xs`

(And yes, I am a beginner in Haskell..)

like image 794
anfo Avatar asked Oct 07 '14 07:10

anfo


People also ask

What is AST in Haskell?

haskell-tools-ast: Haskell AST for efficient toolingA representation of a Haskell Syntax tree that contain source-related and semantic annotations. These annotations help developer tools to work with the defined program.

Is Haskell good for parsing?

Haskell is an excellent language for all your parsing needs. The functional nature of the language makes it easy to compose different building blocks together without worrying about nasty side effects and unforeseen consequences.

What is a parser in Haskell?

Parser combinators are known to be simple to use without requiring external tools or too many concepts to learn. That is, they are ordinary Haskell constructors that can easily be combined and returned with other parsers because of their nature as functions.

What is a token in Haskell?

(mandatory) The %token directive is used to tell Happy about all the terminal symbols used in the grammar. Each terminal has a name, by which it is referred to in the grammar itself, and a Haskell representation enclosed in braces. Each of the patterns must be of the same type, given by the %tokentype directive.


1 Answers

I think I can try to help you out a bit.

The way the problem is represented you should be able to do this by just looking at the next input/token and decide from there where to go.

some assumptions

The way the data is represented as [String] -> (Ast, [String]) I assume it's a common parser, where a parser tries to read some parts of the input and return the parsed/transformed output together with the rest of the input that it did not transform (so just the two parsts of the tuple - the Ast and the rest of the input).

the AST type

as you did not include it I assumed it to be:

data Ast
  = Number Int
  | Atom String
  | List [Ast]
  deriving Show

some things I gonna need

I need a few things:

import Prelude hiding (exp)

import Control.Applicative ((<$>))
import Data.Maybe (fromJust, isJust)

I have to hide exp as we want to use this as a function-name.

Then I want to fmap over a Maybe so I include the operator from Control.Applicative. This is really just this, in case you did not see it before:

f <$> Nothing = Nothing
f <$> Just a  = Just (f a)

I want some helpers for Maybe:

  • isJust to check if for Just _
  • and fromJust to get the a from Just a

Finally I need this helper-function to read a bit more safe:

tryRead :: (Read a) => String -> Maybe a
tryRead input =
  case readsPrec 0 input of
    (a,_):_ -> Just a
    _       -> Nothing

This will try to read a number here - returing Just n if n is a number and Nothing otherwise.

a first go

Here is a unfinished first go at your problem:

exp :: [String] -> (Ast, [String])
exp (lookat:rest)
  | isJust number = (fromJust number, rest)
  | lookat == "("  = parseList rest []
  where number = Number <$> tryRead lookat

parseList :: [String] -> [Ast] -> (Ast, [String])
parseList inp@(lookat:rest) acc
  | lookat == ")" = (List (reverse acc), rest)
  | otherwise    = let (el, rest') = exp inp
                   in parseList rest' (el:acc)

As you can see I just branch based on lookat but with a slight twist:

In case I see a number I return the number and the rest-token-list. In case I see a ( I start another parser parseList.

parseList will do the same: - it looks at the first token - if the token is a ) it finishes the current list (it uses the accumulator technique for this) and returns. - if not it uses the existing exp parser to get the elements of the list recursively.

Here is an example run:

λ> let input = ["(", "2", "(", "3", "4", ")", "5", ")"]

λ> exp input
(List [Number 2,List [Number 3,Number 4],Number 5],[])

TODO

There are some border cases left you have to decide on (what if there are no input tokens?).

And of course you have to add the case for Atoms - to finish this excecise.

full solution

ok - 3 hours later the OP did not check in again so I guess I can post a complete solution. I hope I did not forget any edge cases and this sureley is not the most efficient implementation (tokens comes to mind) - but the examples the OP gave all match:

module Ast where

import Prelude hiding (exp)

import Control.Applicative ((<$>))
import Data.Char (isSpace, isControl)
import Data.Maybe (fromJust, isJust)

data Ast
  = Number Int
  | Atom String
  | List [Ast]
  | Empty
  deriving Show

type Token = String

main :: IO ()
main = do
  print $ parse "(hi (4) 32)"
  print $ parse "(+ 3 42 654 2)"
  print $ parseAst . tokens $ "(+ 21 444) junk"

parse :: String -> Ast
parse = fst . parseAst . tokens

parseAst :: [Token] -> (Ast, [Token])
parseAst [] = (Empty, [])
parseAst (lookat:rest)
  | isJust number = (fromJust number, rest)
  | lookat == "("  = parseList rest []
  | otherwise     = (Atom lookat, rest)
  where number = Number <$> tryRead lookat

parseList :: [Token] -> [Ast] -> (Ast, [Token])
parseList [] _ = error "Syntax error: `)` not found"
parseList inp@(lookat:rest) acc
  | lookat == ")" = (List (reverse acc), rest)
  | otherwise    = let (el, rest') = parseAst inp
                   in parseList rest' (el:acc)
tokens :: String -> [Token]
tokens = split ""
  where split tok "" = add tok []
        split tok (c:cs)
          | c == '(' || c == ')' = add tok $ [c] : split "" cs
          | isSpace c || isControl c = add tok $ split "" cs
          | otherwise = split (tok ++ [c]) cs
        add "" tks = tks
        add t tks =  t : tks

tryRead :: (Read a) => Token -> Maybe a
tryRead input =
  case readsPrec 0 input of
    (a,_):_ -> Just a
    _       -> Nothing

example run

λ> :main
List [Atom "hi",List [Number 4],Number 32]
List [Atom "+",Number 3,Number 42,Number 654,Number 2]
(List [Atom "+",Number 21,Number 444],["junk"])
like image 198
Random Dev Avatar answered Oct 21 '22 08:10

Random Dev