Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to leave out unimportant nodes?

I am using ANTLR 4.9.2 to parse a grammar that represents assembly instructions.

grammar IrohAsm;
main: line* | EOF;

line: (rangedec | instruction | comment)? EOL;

instruction: MNEMONIC firstoperand COMMA secondoperand;
rangedec : range assignment?;
firstoperand : range | mem | REGISTER;
secondoperand : range | mem | IMM | REGISTER;
range : IDENTIFIER OPENBRACKETS IMM CLOSEDBRACKETS;
assignment : EQUALS OPENCURL IMM (COMMA IMM)* CLOSECURL;
mem : AT IMM;
comment : '#' ~EOL*;

WHITESPACE : (' ') -> skip ;

// remember to append \n to input
EOL : '\n';

OPENCURL : '{';
CLOSECURL : '}';
OPENBRACKETS : '[';
CLOSEDBRACKETS : ']';
COMMA : ',';
EQUALS : '=';
AT : '@';
MNEMONIC : ('jmp' | 'add' | 'sub' | 'jez' | 'mov' | 'wrt' | 'get');
REGISTER: ('ab' | 'bb' | 'cb' | 'db');
IMM : DIGITS RADIX?;
RADIX : ('d' | 'b' | 'h');
DIGITS : [0-9]+;
IDENTIFIER: ([a-zA-Z0-9] | '$' | '_' | '\u00C0'..'\uFFFF')+ ;

The grammar works fine, but generates trees like the following;

Parse Tree Example

when given the following input:

mov ab,ab

As you can see, COMMA is included as one of the children of instruction. Its placement is important for the language, but I don't really care about it after parsing. Is there some way I could leave it off the final tree entirely? And if so, would this be a change to the grammar, or my code to parse the tree?

Removing Extraneous Nodes

My current code to get the tree:

CharStream inputStream = CharStreams.fromFileName("src/test/assembly/cool.asm");
IrohAsmLexer lexer = new IrohAsmLexer(inputStream);
IrohAsmParser parser = new IrohAsmParser(new CommonTokenStream(lexer));
ParseTree parseTree = parser.main();
like image 299
Michael Avatar asked Mar 01 '23 10:03

Michael


2 Answers

Your question boils down to: "how can I convert my parse tree to an abstract syntax tree?". The simple answer to that is: "you can't" :). At least, not using a built-in ANTLR mechanism. You'll have to traverse the parse tree (using ANTLR's visitor- or listener mechanism) and construct your AST manually.

The feature to more easily create AST's from a parse tree often pops up both in ANTLR's Github repo:

  • https://github.com/antlr/antlr4/issues/2428

as well as on stackoverflow:

  • Generate AST using ANTLR4 generated visitor
  • ANTLR4 AST Creation - How to create an AstVistor
like image 75
Bart Kiers Avatar answered Mar 03 '23 23:03

Bart Kiers


For ANTLR, as Bart says, you can't do it without doing it yourself. Essentially you have to write custom code to walk over the CST and construct a custom AST.

It doesn't have to be this way. You can construct parser generators that automatically build trees that:

  • Leave out nodes that don't carry any values (e.g., "comma", EOL)
  • Remove chains of unary productions (e.g., "firstoperand" and "secondoperand; these are short chains in your example but expression trees tend to have long ones)
  • Construct node-lists whenever right or left leaning trees are generated from grammar rules that represent lists, replacing the "spine" nodes and the list-ending node if any with a single list node. (Your example doesn't have any of these). This makes lists smaller and easier to manipulate.

The result gives something very close to classic ASTs without any manual effort; the resulting trees are often 30-50% of the size of the CSTs from which they are automatically derived.

This matters when

  • working with large grammars (you don't want have define the mapping of every single node by hand for 3500 rules)
  • working with grammars that are rapidly changing (as you try get a first working draft right)
  • building tools that process very large trees (representing very large [systems] of source code, say the Linux source tree) that have to touch billions of nodes to compute the answer, by avoiding lots of accesses to cache lines that would contain concrete nodes or list spines.

I'd provide the name of tool that does this that I designed and built, but some SO people hate it when I do so. You can check my profile.

like image 39
Ira Baxter Avatar answered Mar 04 '23 00:03

Ira Baxter