Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unparse AST < O(exp(n))?

Abstract problem description:

The way I see it, unparsing means to create a token stream from an AST, which when parsed again produces an equal AST.

So parse(unparse(AST)) = AST holds.

This is the equal to finding a valid parse tree which would produce the same AST.

The language is described by a context free S-attributed grammar using a eBNF variant.

So the unparser has to find a valid 'path' through the traversed nodes in which all grammar constraints hold. This bascially means to find a valid allocation of AST nodes to grammar production rules. This is a constraint satisfaction problem (CSP) in general and could be solved, like parsing, by backtracking in O(exp(n)).

Fortunately for parsing, this can be done in O(n³) using GLR (or better restricting the grammar). Because the AST structure is so close to the grammar production rule structure, I was really surprised seeing an implementation where the runtime is worse than parsing: XText uses ANTLR for parsing and backtracking for unparsing.

Questions

  1. Is a context free S-attribute grammar everything a parser and unparser need to share or are there further constraints, e.g. on the parsing technique / parser implementation?
  2. I've got the feeling this problem isn't O(exp(n)) in general - could some genius help me with this?
  3. Is this basically a context-sensitive grammar?

Example1:

Area    returns AnyObject   -> Pedestrian | Highway
Highway returns AnyObject   -> "Foo" Car
Pedestrian  returns AnyObject   -> "Bar" Bike
Car     returns Vehicle     -> anyObjectInstance.name="Car"
Bike    returns Vehicle     -> anyObjectInstance.name="Bike" 

So if I have an AST containing

AnyObject -> AnyObject -> Vehicle [name="Car"] and I know the root can be Area, I could resolve it to

Area    -> Highway  -> Car

So the (Highway | Pedestrian) decision depends on the subtree decisions. The problem get's worse when a leaf might be, at first sight, one of several types, but has to be a specific one to form a valid path later on.

Example2:

So if I have S-attribute rules returning untyped objects, just assigning some attributes, e.g.

A -> B C   {Obj, Obj}
X -> Y Z   {Obj, Obj}
B -> "somekeyword"  {0}
Y -> "otherkeyword" {0}
C -> "C" {C}
Z -> "Z" {Z}

So if an AST contains

   Obj
  /  \
"0"  "C"

I can unparse it to

   A
  / \
 B   C

just after I could resolve "C" to C.

While traversing the AST, all constraints I can generate from the grammar are satisfied for both rules, A and X, until I hit "C". This means that for

A -> B | C 
B -> "map"  {MagicNumber_42}
C -> "foreach" {MagicNumber_42}

both solutions for the tree

     Obj
      |
 MagicNumber_42

are valid and it is considered that they have equal semantics ,e.g. syntactic sugar.

Further Information:

  • unparsing in XText

  • grammar constraints for unparsing, see Serializer: Concrete Syntax Validation

like image 573
Stefan K. Avatar asked Aug 12 '12 01:08

Stefan K.


People also ask

What is ast in coding?

Abstract Syntax Trees or ASTs are tree representations of code. They are a fundamental part of the way a compiler works. When a compiler transforms some code there are fundamentally the following steps: Lexical Analysis.

What does ast mean in Python?

Source code: Lib/ast.py. The ast module helps Python applications to process trees of the Python abstract syntax grammar. The abstract syntax itself might change with each Python release; this module helps to find out programmatically what the current grammar looks like.

How do you use ast?

How to do using ast library, a = b + 3 or a = 3+b , both have same node type i.e. BinOp, you can validate variable “a” value and its node type. For each line of code, create AST node then compare value, node type and other parameters as well like operator, operand, function name, class name, index, etc… if required.


1 Answers

Question 1: no, the grammar itself may not be enough. Take the example of an ambiguous grammar. If you ended up with a unique leftmost (rightmost) derivation (the AST) for a given string, you would somehow have to know how the parser eliminated the ambiguity. Just think of the string 'a+b*c' with the naive grammar for expressions 'E:=E+E|E*E|...'.

Question 3: none of the grammar examples you give is context sensitive. The lefthand-side of the productions are a single non-terminal, there is no context.

like image 198
user1666959 Avatar answered Oct 07 '22 02:10

user1666959