I am trying to get my head around the tree construction operators (^ and !) in ANTLR.
I have a grammar for flex byte arrays (a UINT16 that describe number of bytes in array, followed by that many bytes). I've commented out all of the semantic predicates and their associated code that does validate that there are as many bytes in the array as indicated by the first two bytes...that part isn't what I'm having problems with.
My issue is with the tree that is generated after parsing some input. All that happens is that each character is a sibling node. I was expecting that the AST generated would be similar to the tree that you can see in Interpreter window of ANTLRWorks 1.4. As soon as I try to change how the tree would be made using a ^ character, I get an exception of the form:
Unhandled Exception: System.SystemException: more than one node as root (TODO: make exception hierarchy)
Here's the grammar (currently targeting C#):
grammar FlexByteArray_HexGrammar;
options
{
//language = 'Java';
language = 'CSharp2';
output=AST;
}
expr
: array_length remaining_data
//the amount of remaining data must be equal to the array_length (times 2 since 2 hex characters per byte)
// need to check if the array length is zero first to avoid checking $remaining_data.text (null reference) in that situation.
//{ ($array_length.value == 0 && $remaining_data.text == null) || ($remaining_data.text != null && $array_length.value*2 == $remaining_data.text.Length) }?
;
array_length //returns [UInt16 value]
: uint16_little //{ $value = $uint16_little.value; }
;
hex_byte1 //needed just so I can distinguish between two bytes in a uint16 when doing a semantic predicate (or whatever you call it where I write in the target language in curly brackets)
: hex_byte
;
uint16_big //returns [UInt16 value]
: hex_byte1 hex_byte //{ $value = Convert.ToUInt16($hex_byte.text + $hex_byte1.text); }
;
uint16_little //returns [UInt16 value]
: hex_byte1 hex_byte //{ $value = Convert.ToUInt16($hex_byte1.text + $hex_byte.text); }
;
remaining_data
: hex_byte*
;
hex_byte
: HEX_DIGIT HEX_DIGIT
;
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F')
;
Here's kind of what I thought the AST would be:
Here's the program in C# I was using to get a visual (actually textual, but then I put it thru GraphViz to get picture) representation of the AST:
namespace FlexByteArray_Hex
{
using System;
using Antlr.Runtime;
using Antlr.Runtime.Tree;
using Antlr.Utility.Tree;
public class Program
{
public static void Main(string[] args)
{
ICharStream input = new ANTLRStringStream("0001ff");
FlexByteArray_HexGrammarLexer lex = new FlexByteArray_HexGrammarLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lex);
FlexByteArray_HexGrammarParser parser = new FlexByteArray_HexGrammarParser(tokens);
Console.WriteLine("Parser created.");
CommonTree tree = parser.expr().Tree as CommonTree;
Console.WriteLine("------Input parsed-------");
if (tree == null)
{
Console.WriteLine("Tree is null.");
}
else
{
DOTTreeGenerator treegen = new DOTTreeGenerator();
Console.WriteLine(treegen.ToDOT(tree));
}
}
}
}
Here's what the output of that program put into GraphViz looks like:
The same program in Java (in case you want to try it out and don't use C#):
import org.antlr.*;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
public class Program
{
public static void main(String[] args) throws Exception
{
FlexByteArray_HexGrammarLexer lex = new FlexByteArray_HexGrammarLexer(new ANTLRStringStream("0001ff"));
CommonTokenStream tokens = new CommonTokenStream(lex);
FlexByteArray_HexGrammarParser parser = new FlexByteArray_HexGrammarParser(tokens);
System.out.println("Parser created.");
CommonTree tree = (CommonTree)parser.expr().tree;
System.out.println("------Input parsed-------");
if (tree == null)
{
System.out.println("Tree is null.");
}
else
{
DOTTreeGenerator treegen = new DOTTreeGenerator();
System.out.println(treegen.toDOT(tree));
}
}
}
Anssssss wrote:
As soon as I try to change how the tree would be made using a ^ character, I get an exception of the form:
When trying to make the the parser rule a
the root of the tree inside p
like this:
p : a^ b;
a : A A;
b : B B;
ANTLR does not know which of the A
's is the root of rule a
. And there can't be two roots, of course.
The inline tree operators are handy in certain cases, but in this case, they're not up to the task. You can't assign a root inside a production rule that can possible have no contents, like your remaining_data
rule. In that case you need to create "imaginary tokens" in the tokens { ... }
section of your grammar and use rewrite rule (-> ^( ... )
) to create your AST.
The following grammar:
grammar FlexByteArray_HexGrammar;
options {
output=AST;
}
tokens {
ROOT;
ARRAY;
LENGTH;
DATA;
}
expr
: array* EOF -> ^(ROOT array*)
;
array
@init { int index = 0; }
: array_length array_data[$array_length.value] -> ^(ARRAY array_length array_data)
;
array_length returns [int value]
: a=hex_byte b=hex_byte {$value = $a.value*16*16 + $b.value;} -> ^(LENGTH hex_byte hex_byte)
;
array_data [int length]
: ({length > 0}?=> hex_byte {length--;})* {length == 0}? -> ^(DATA hex_byte*)
;
hex_byte returns [int value]
: a=HEX_DIGIT b=HEX_DIGIT {$value = Integer.parseInt($a.text+$b.text, 16);}
;
HEX_DIGIT
: '0'..'9' | 'a'..'f' | 'A'..'F'
;
will parse the following input:
0001ff0002abcd
into the following AST:
as you can see using the following main class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
FlexByteArray_HexGrammarLexer lexer = new FlexByteArray_HexGrammarLexer(new ANTLRStringStream("0001ff0002abcd"));
FlexByteArray_HexGrammarParser parser = new FlexByteArray_HexGrammarParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.expr().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
array_data
parser rule, see this previous Q&A: What is a 'semantic predicate' in ANTLR?
-> ^( ... )
, see this previous Q&A: How to output the AST built using ANTLR?
To give a short explanation of the array_data
rule:
array_data [int length]
: ({length > 0}?=> hex_byte {length--;})* {length == 0}? -> ^(DATA hex_byte*)
;
As you already mentioned in the comments, you can pass one or more parameters to rules by adding [TYPE IDENTIFIER]
after the rule.
The first (gated) semantic predicate, {length > 0}?=>
, checks if length
is more than zero. If this is the case, the parser tries to match a hex_byte
after which the length
variable is decreased by one. This all stops when length
is zero, or when the parser has no more hex_byte
to be parsed which happens when the EOF is next in line. Because it could parse less than the mandatory number of hex_byte
s, there is a (validating) semantic predicate at the very end of the rule, {length == 0}?
, that ensures that the correct amount of hex_byte
s have been parsed (no more, and no less!).
Hope that clarifies it a bit more.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With