Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is an appropriate data structure or algorithm for producing an immutable concrete syntax tree in a functionally pure manner?

Given a LL(1) grammar what is an appropriate data structure or algorithm for producing an immutable concrete syntax tree in a functionally pure manner? Please feel free to write example code in whatever language you prefer.

My Idea

symbol : either a token or a node

result : success or failure

token : a lexical token from source text
    value -> string : the value of the token
    type -> integer : the named type code of the token
    next -> token : reads the next token and keeps position of the previous token
    back -> token : moves back to the previous position and re-reads the token

node : a node in the syntax tree 
    type -> integer : the named type code of the node
    symbols -> linkedList : the symbols found at this node
    append -> symbol -> node : appends the new symbol  to a new copy of the node

Here is an idea I have been thinking about. The main issue here is handling syntax errors. I mean I could stop at the first error but that doesn't seem right.

let program token =
    sourceElements (node nodeType.program) token        

let sourceElements node token =
    let (n, r) = sourceElement (node.append (node nodeType.sourceElements)) token
    match r with
    | success -> (n, r) 
    | failure -> // ???     

let sourceElement node token =
    match token.value with
    | "function" -> 
        functionDeclaration (node.append (node nodeType.sourceElement)) token
    | _ -> 
        statement (node.append (node nodeType.sourceElement)) token 

Please Note

I will be offering up a nice bounty to the best answer so don't feel rushed. Answers that simply post a link will have less weight over answers that show code or contain detailed explanations.

Final Note

I am really new to this kind of stuff so don't be afraid to call me a dimwit.

like image 934
ChaosPandion Avatar asked Aug 05 '10 14:08

ChaosPandion


People also ask

What is a syntax tree explain the procedure for constructing a syntax tree with the help of an example?

The syntax tree is shortened form of the Parse Tree. Example1 − Draw Syntax Tree for the string a + b ∗ c − d. Each node in a syntax tree can be executed as data with multiple fields. In the node for an operator, one field recognizes the operator and the remaining field includes a pointer to the nodes for the operands.

What is concrete syntax tree in compiler design?

Originally, Concrete Syntax Tree (CST) is used for representation of a source code. CST represents concrete source code elements attached to corresponding construction in language syntax.

How do you construct a syntax tree for an expression?

CONSTRUCTING SYNTAX TREES FOR EXPRESSIONS. Each node in a syntax tree for an (arithmetic) expression is a record with several fields. In the node for an operator, one field identifies the operator and the remaining fields contain pointers to the nodes of the operands. The operator is often called the label of the node.


2 Answers

You want to parse something into an abstract syntax tree.

In the purely functional programming language Haskell, you can use parser combinators to express your grammar. Here an example that parses a tiny expression language:

EDIT Use monadic style to match Graham Hutton's book

    -- import a library of *parser combinators*
import Parsimony
import Parsimony.Char
import Parsimony.Error
(+++) = (<|>)

    -- abstract syntax tree
data Expr = I Int
          | Add Expr Expr
          | Mul Expr Expr
          deriving (Eq,Show)

    -- parse an expression
parseExpr :: String -> Either ParseError Expr
parseExpr = Parsimony.parse pExpr
    where

        -- grammar
    pExpr :: Parser String Expr
    pExpr = do
        a <- pNumber +++ parentheses pExpr  -- first argument
        do
            f <- pOp                        -- operation symbol
            b <- pExpr                      -- second argument
            return (f a b)
         +++ return a                       -- or just the first argument

    parentheses parser = do                 -- parse inside parentheses
        string "("
        x <- parser
        string ")"
        return x

    pNumber = do                            -- parse an integer
        digits <- many1 digit
        return . I . read $ digits

    pOp =                                   -- parse an operation symbol
         do string "+"
            return Add
        +++ 
         do string "*"
            return Mul

Here an example run:

*Main> parseExpr "(5*3)+1"
Right (Add (Mul (I 5) (I 3)) (I 1))

To learn more about parser combinators, see for example chapter 8 of Graham Hutton's book "Programming in Haskell" or chapter 16 of "Real World Haskell".

Many parser combinator library can be used with different token types, as you intend to do. Token streams are usually represented as lists of tokens [Token].

like image 106
Heinrich Apfelmus Avatar answered Nov 09 '22 01:11

Heinrich Apfelmus


Definitely check out the monadic parser combinator approach; I've blogged about it in C# and in F#.

like image 23
Brian Avatar answered Nov 09 '22 02:11

Brian