Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

EBNF to Scala parser combinator

I have the following EBNF that I want to parse:

PostfixExp      -> PrimaryExp ( "[" Exp "]" 
                                | . id "(" ExpList ")" 
                                | . length )*

And this is what I got:

def postfixExp: Parser[Expression] = (
    primaryExp ~ rep(
        "[" ~ expression ~ "]"
        | "." ~ ident ~"(" ~ repsep(expression, "," ) ~ ")" 
        | "." ~ "length") ^^ {
        case primary ~ list =>  list.foldLeft(primary)((prim,post) =>
                post match {
                    case "[" ~ length ~ "]" => ElementExpression(prim, length.asInstanceOf[Expression])
                    case "." ~ function ~"(" ~ arguments ~ ")" =>  CallMethodExpression(prim, function.asInstanceOf[String], arguments.asInstanceOf[List[Expression]])
                    case _ => LengthExpression(prim)
                }
            )
    })

But I would like to know if there is a better way, preferably without having to resort to casting (asInstanceOf).

like image 303
Daniel O Avatar asked Feb 08 '09 16:02

Daniel O


1 Answers

I would do it like this:

type E = Expression

def postfixExp = primaryExp ~ rep(
    "[" ~> expr <~ "]" ^^ { e => ElementExpression(_:E, e) }
  | "." ~ "length" ^^^ LengthExpression
  | "." ~> ident ~ ("(" ~> repsep(expr, ",") <~ ")") ^^ flatten2 { (f, args) =>
      CallMethodExpression(_:E, f, args)
    }
) ^^ flatten2 { (e, ls) => collapse(ls)(e) }

def expr: Parser[E] = ...

def collapse(ls: List[E=>E])(e: E) = {
  ls.foldLeft(e) { (e, f) => f(e) }
}

Shortened expressions to expr for brevity as well as added the type alias E for the same reason.

The trick that I'm using here to avoid the ugly case analysis is to return a function value from within the inner production. This function takes an Expression (which will be the primary) and then returns a new Expression based on the first. This unifies the two cases of dot-dispatch and bracketed expressions. Finally, the collapse method is used to merge the linear List of function values into a proper AST, starting with the specified primary expression.

Note that LengthExpression is just returned as a value (using ^^^) from its respective production. This works because the companion objects for case classes (assuming that LengthExpression is indeed a case class) extend the corresponding function value delegating to their constructor. Thus, the function represented by LengthExpression takes a single Expression and returns a new instance of LengthExpression, precisely satisfying our needs for the higher-order tree construction.

like image 64
Daniel Spiewak Avatar answered Oct 21 '22 08:10

Daniel Spiewak