Abstract problem description:
The way I see it, unparsing means to create a token stream from an AST, which when parsed again produces an equal AST.
So parse(unparse(AST)) = AST
holds.
This is the equal to finding a valid parse tree which would produce the same AST.
The language is described by a context free S-attributed grammar using a eBNF variant.
So the unparser has to find a valid 'path' through the traversed nodes in which all grammar constraints hold. This bascially means to find a valid allocation of AST nodes to grammar production rules. This is a constraint satisfaction problem (CSP) in general and could be solved, like parsing, by backtracking in O(exp(n)).
Fortunately for parsing, this can be done in O(n³) using GLR (or better restricting the grammar). Because the AST structure is so close to the grammar production rule structure, I was really surprised seeing an implementation where the runtime is worse than parsing: XText uses ANTLR for parsing and backtracking for unparsing.
Questions
Example1:
Area returns AnyObject -> Pedestrian | Highway
Highway returns AnyObject -> "Foo" Car
Pedestrian returns AnyObject -> "Bar" Bike
Car returns Vehicle -> anyObjectInstance.name="Car"
Bike returns Vehicle -> anyObjectInstance.name="Bike"
So if I have an AST containing
AnyObject -> AnyObject -> Vehicle [name="Car"]
and I know the root can be Area, I could resolve it to
Area -> Highway -> Car
So the (Highway | Pedestrian) decision depends on the subtree decisions. The problem get's worse when a leaf might be, at first sight, one of several types, but has to be a specific one to form a valid path later on.
Example2:
So if I have S-attribute rules returning untyped objects, just assigning some attributes, e.g.
A -> B C {Obj, Obj}
X -> Y Z {Obj, Obj}
B -> "somekeyword" {0}
Y -> "otherkeyword" {0}
C -> "C" {C}
Z -> "Z" {Z}
So if an AST contains
Obj
/ \
"0" "C"
I can unparse it to
A
/ \
B C
just after I could resolve "C" to C.
While traversing the AST, all constraints I can generate from the grammar are satisfied for both rules, A and X, until I hit "C". This means that for
A -> B | C
B -> "map" {MagicNumber_42}
C -> "foreach" {MagicNumber_42}
both solutions for the tree
Obj
|
MagicNumber_42
are valid and it is considered that they have equal semantics ,e.g. syntactic sugar.
Further Information:
unparsing in XText
grammar constraints for unparsing, see Serializer: Concrete Syntax Validation
Abstract Syntax Trees or ASTs are tree representations of code. They are a fundamental part of the way a compiler works. When a compiler transforms some code there are fundamentally the following steps: Lexical Analysis.
Source code: Lib/ast.py. The ast module helps Python applications to process trees of the Python abstract syntax grammar. The abstract syntax itself might change with each Python release; this module helps to find out programmatically what the current grammar looks like.
How to do using ast library, a = b + 3 or a = 3+b , both have same node type i.e. BinOp, you can validate variable “a” value and its node type. For each line of code, create AST node then compare value, node type and other parameters as well like operator, operand, function name, class name, index, etc… if required.
Question 1: no, the grammar itself may not be enough. Take the example of an ambiguous grammar. If you ended up with a unique leftmost (rightmost) derivation (the AST) for a given string, you would somehow have to know how the parser eliminated the ambiguity. Just think of the string 'a+b*c' with the naive grammar for expressions 'E:=E+E|E*E|...'.
Question 3: none of the grammar examples you give is context sensitive. The lefthand-side of the productions are a single non-terminal, there is no context.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With