Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parser Combinators, separating grammar and AST construction

I'm writing a simple functional programming language in Scala using the parser-combinators library.

The syntax is specified here: https://github.com/hejfelix/Frase/blob/master/src/main/scala/it/vigtig/lambda/ParserLike.scala

There is one thing which I haven't been able to solve with the implementation: how do I separate grammar definition from the transformation into AST nodes?

Would be really cool to have a close-to-human-readable grammar directly the parser source, especially considering that I'm the only programmer on the project ATM and it could serve as documentation.

How do I separate the grammar and AST-specific code?

like image 777
Felix Avatar asked Oct 31 '15 10:10

Felix


3 Answers

This is a great question, and one that I struggled with for a long time before coming up with a solution that I feel works pretty well for me.

When building a parser, I use two different syntax trees:

  • an Concrete Syntax Tree, or CST: this is a tree form of the text, and has a 1:1 correspondence with the text. Everything that appears in the text will also appear in the CST.

  • an Abstract Syntax Tree, or AST: this doesn't necessarily have a 1:1 correspondence with the text, as unnecessary textual details (such as braces, punctuation, etc.) have been removed and don't appear in the AST.

Thus, getting from the input text to the AST has two steps: the first step is to parse the input string into a CST; the second step is to convert the CST to an AST, throwing away unnecessary details.

  1. String -> CST This is where I use parser combinators. I don't do any manipulation of the tree structure in this stage. The structure of the CST is entirely determined by the combinators used. Each combinator produces a sub-tree of a certain shape, which I never change in this stage. There aren't any actions attached to combinators, so the grammar definition is clean and free of any AST information.

  2. CST -> AST This is where I massage the parse tree, extracting the important stuff, ignoring the rest. This is also where I often perform context-sensitive checks (for example: checking that a function definition doesn't have duplicate parameter names), keeping these details out of the actual parsing stage.


Example: here's a JSON parser I built using this method:

  • the general parser combinators

  • the combinator for building CST nodes

  • the actual JSON parser that builds a CST

like image 182
Matt Fenwick Avatar answered Nov 07 '22 23:11

Matt Fenwick


Well, in principle, all your AST transformation have a specific type. You could define them all elsewhere and use them from your grammar definition. It would clear up things somewhat. Alternatively, you could define your grammar definitions as "pass by name" functions that are evaluated on call, then use them from your transformation.

Basically any language allows you to break complexity by defining things somewhere and referring to them anywhere else. Since scala allows you to have functions as values this is even easier.

like image 2
Daniel Langdon Avatar answered Nov 07 '22 21:11

Daniel Langdon


Would be really cool to have a close-to-human-readable grammar directly "building" the parser source...

I wonder what that "close-to-human-readable grammar" is?

How do I separate grammar definition from the transformation into AST nodes?

What you have is a handwritten Packrat Parser.

I could be wrong, but i understand this question as a request to use a standalone grammar definition to build a parser. And then use that parser to get the syntax tree of the parsed source.

So, the grammer could be an EBNF or PEG or CFG or "your own" grammar, right?

Anyway...

  • Lets start with a "separate grammar definition", e.g. EBNF.

  • Then you need a parser for the grammar, e.g. an EBNFParser.

  • Parsing the grammar with that parser results is an internal structure of that grammar: a syntax tree.

  • Given a syntax tree for a valid grammar, you may return an association list with the keys (as the meta identifiers) and attach grammar rules to them.

    foreach grammar key add matching grammar rule

  • That means that you need to pick a Grammar Rule identified by RuleName and add its Rule to a "Constructed Parser".

  • At the end: you have a "Constructed Parser" assembled from individual "Grammar Rules" able to parse Source defined by the given Grammar.

  • Parsing the Source, gives you a Syntax Tree for the source.

Pass 1

Grammar -> GrammarParser -> GrammarTree -> GrammarRules -> ConstructedParserForGrammar

Pass 2

Source -> ConstructedParserForGrammar -> Syntax Tree -> Transformations...

In other words: its quite a puzzle to go from BNF to an automatically constructed Packrat parser.

like image 1
Jens A. Koch Avatar answered Nov 07 '22 22:11

Jens A. Koch