I'm attemping to learn language parsing for fun...
I've created a ANTLR grammar which I believe will match a simple language I am hoping to implement. It will have the following syntax:
<FunctionName> ( <OptionalArguments>+) {
<OptionalChildFunctions>+
}
Actual Example:
ForEach(in:[1,2,3,4,5] as:"nextNumber") {
Print(message:{nextNumber})
}
I believe I have the grammar working correctly to match this construct, and now I am attemping to build an Abstract Syntax Tree for the language.
Firstly, I must admit I'm not exactly sure HOW this tree should look. Secondly, I'm at a complete loss how to do this in my Antlr grammar...I've been trying without much success for hours.
This is the current idea I'm going with for the tree:
FunctionName
/ \
Attributes \
/ \ / \
ID /\ ChildFunctions
/ \ ID etc
/ \
Attribute AttributeValue
Type
This is my current Antlr grammar file:
grammar Test;
options {output=AST;ASTLabelType=CommonTree;}
program : function ;
function : ID (OPEN_BRACKET (attribute (COMMA? attribute)*)? CLOSE_BRACKET)? (OPEN_BRACE function* CLOSE_BRACE)?;
attribute : ID COLON datatype;
datatype : NUMBER | STRING | BOOLEAN | array | lookup ;
array : OPEN_BOX (datatype (COMMA datatype)* )? CLOSE_BOX ;
lookup : OPEN_BRACE (ID (PERIOD ID)*) CLOSE_BRACE;
NUMBER
: ('+' | '-')? (INTEGER | FLOAT)
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
BOOLEAN
: 'true' | 'TRUE' | 'false' | 'FALSE'
;
ID : (LETTER|'_') (LETTER | INTEGER |'_')*
;
COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
WHITESPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;
COLON : ':' ;
COMMA : ',' ;
PERIOD : '.' ;
OPEN_BRACKET : '(' ;
CLOSE_BRACKET : ')' ;
OPEN_BRACE : '{' ;
CLOSE_BRACE : '}' ;
OPEN_BOX : '[' ;
CLOSE_BOX : ']' ;
fragment
LETTER
: 'a'..'z' | 'A'..'Z'
;
fragment
INTEGER
: '0'..'9'+
;
fragment
FLOAT
: INTEGER+ '.' INTEGER*
;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
;
ANY help / advice would be great. I've tried reading dozens of tutorials and nothing about the AST generation seems to stick :(
Step 1 is to make the tree look like the little graph that you posted. Right now, you don't have any tree construction operators, so you're going to end up with a flat list.
See tree construction on the antlr.org website.
You can use ANTLRWorks to see what your getting for a parse tree and AST. Start adding tree construction operators and watch how things change.
EDIT / Additional Info:
Here's a process you can follow to give you a rough idea of how to do it:
There are two basic operators for tree construction - The exclamation mark !
which tells the compiler to not place the node within the AST, and the carot ^
, which tells ANTLR to make something the root node. Start by going through each non-terminal rule and deciding which elements don't need to be in the AST. For example, you don't need commas or parenthesis. Once you have all the information you can populate the a structure (or create your own AST structure) that provides all the information. Commas don't help any more, so add a !
to them. For example:
function: ID (OPEN_BRACKET! (attribute (COMMA!? attribute)*)? CLOSE_BRACKET!)? (OPEN_BRACE! function* CLOSE_BRACE!)?;
Take a look at the AST in ANTLRWorks before and after. Compare.
ID
to be the root node, so add a ^
after ID
and compare in ANTLRWorks.Here's a few changes that bring it closer to what I think you want:
program : function ;
function : ID^ (OPEN_BRACKET! attributeList? CLOSE_BRACKET!)? (OPEN_BRACE! function* CLOSE_BRACE!)?;
attributeList: (attribute (COMMA!? attribute)*);
attribute : ID COLON! datatype;
datatype : NUMBER | STRING | BOOLEAN | array | lookup ;
array : OPEN_BOX! (datatype^ (COMMA! datatype)* )? CLOSE_BOX!;
lookup : OPEN_BRACE! (ID (PERIOD! ID)*) CLOSE_BRACE!;
With that under your belt, now go look at some of the tutorials.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With