I am using ANTLR 4.9.2 to parse a grammar that represents assembly instructions.
grammar IrohAsm;
main: line* | EOF;
line: (rangedec | instruction | comment)? EOL;
instruction: MNEMONIC firstoperand COMMA secondoperand;
rangedec : range assignment?;
firstoperand : range | mem | REGISTER;
secondoperand : range | mem | IMM | REGISTER;
range : IDENTIFIER OPENBRACKETS IMM CLOSEDBRACKETS;
assignment : EQUALS OPENCURL IMM (COMMA IMM)* CLOSECURL;
mem : AT IMM;
comment : '#' ~EOL*;
WHITESPACE : (' ') -> skip ;
// remember to append \n to input
EOL : '\n';
OPENCURL : '{';
CLOSECURL : '}';
OPENBRACKETS : '[';
CLOSEDBRACKETS : ']';
COMMA : ',';
EQUALS : '=';
AT : '@';
MNEMONIC : ('jmp' | 'add' | 'sub' | 'jez' | 'mov' | 'wrt' | 'get');
REGISTER: ('ab' | 'bb' | 'cb' | 'db');
IMM : DIGITS RADIX?;
RADIX : ('d' | 'b' | 'h');
DIGITS : [0-9]+;
IDENTIFIER: ([a-zA-Z0-9] | '$' | '_' | '\u00C0'..'\uFFFF')+ ;
The grammar works fine, but generates trees like the following;
when given the following input:
mov ab,ab
As you can see, COMMA is included as one of the children of instruction. Its placement is important for the language, but I don't really care about it after parsing. Is there some way I could leave it off the final tree entirely? And if so, would this be a change to the grammar, or my code to parse the tree?
My current code to get the tree:
CharStream inputStream = CharStreams.fromFileName("src/test/assembly/cool.asm");
IrohAsmLexer lexer = new IrohAsmLexer(inputStream);
IrohAsmParser parser = new IrohAsmParser(new CommonTokenStream(lexer));
ParseTree parseTree = parser.main();
Your question boils down to: "how can I convert my parse tree to an abstract syntax tree?". The simple answer to that is: "you can't" :). At least, not using a built-in ANTLR mechanism. You'll have to traverse the parse tree (using ANTLR's visitor- or listener mechanism) and construct your AST manually.
The feature to more easily create AST's from a parse tree often pops up both in ANTLR's Github repo:
as well as on stackoverflow:
For ANTLR, as Bart says, you can't do it without doing it yourself. Essentially you have to write custom code to walk over the CST and construct a custom AST.
It doesn't have to be this way. You can construct parser generators that automatically build trees that:
The result gives something very close to classic ASTs without any manual effort; the resulting trees are often 30-50% of the size of the CSTs from which they are automatically derived.
This matters when
I'd provide the name of tool that does this that I designed and built, but some SO people hate it when I do so. You can check my profile.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With