I'm trying to capture a command that looks like _SC play Piano 1 to a tree with 3 nodes "_SC" "play" and "Piano 1"
the grammar I've got so far is
grammar PBScript;
options {
output = AST;
language = CSharp2;
}
line : COMMAND WS ACTION;
COMMAND : '_SC';
ACTION : 'play';
WS : (' '|'\t')+ ;
When I create another rule to capture the "Piano 1" part like so:
grammar PBScript;
options {
output = AST;
language = CSharp2;
}
line : COMMAND WS ACTION WS PARAMETER;
COMMAND : '_SC';
ACTION : 'play';
WS : (' '|'\t')+;
PARAMETER
: (~('\n'|'\r'))+ ;
I get a MismatchedTokenException(6!=5). I get that the grammar is wrong and I know partially why it's wrong. It's ambiguous because WS overlaps PARAMETER. I just don't know how to fix it.
There are other actions besides _SC and PARAMETER should be optional there will even be a different line type eventually that looks like Name: blah blah blah
where I'll at least need "Name" and "blah blah blah" in the tree just in case that matters, but right now I'm just trying to figure out what to use for PARAMETER.
~Tom
EDIT: The string "Piano 1" should be any string of non newline characters so from the first non whitespace after play to the end of the line.
You can't use a PARAMETER
rule like that in your lexer. ANTLR's lexer matches tokens greedily: so PARAMETER
would gobble up the entire line: no COMMAND
or ACTION
tokens will ever be created.
To be able to match something to the end of the line, you'd need a parser rule for it. But then the parser must have a notion of what a new line is (i.e. the lexer will need to produce new-line tokens).
grammar T;
options {
output=AST;
}
tokens {
LINE;
PARAMS;
}
line
: COMMAND ACTION rest_of_line NL
-> ^(LINE COMMAND ACTION ^(PARAMS rest_of_line))
;
rest_of_line
: ~NL* // match any token other than a line break zero or more times
;
COMMAND : '_SC';
ACTION : 'play';
WORD : ('a'..'z' | 'A'..'Z')+;
NUMBER : '0'..'9';
WS : (' '|'\t')+ {skip();};
NL : '\r'? '\n' | '\r';
If you now parse your input "_SC play Piano 1"
you'd end up with the following AST:
This grammar will parse your _SC play Piano 1 statement:
grammar PBScript;
options {
language = CSharp2;
output=AST;
}
tokens
{
COMMAND;
ACTION;
PARAM;
}
program : lines;
lines : line*;
line: 'command:' command action parameter param_modifier
;
command
: IDENTIFIER
-> ^(COMMAND IDENTIFIER)
;
action : IDENTIFIER
-> ^(ACTION IDENTIFIER)
;
parameter : IDENTIFIER
-> ^(PARAM IDENTIFIER)
;
param_modifier : INTEGER
;
IDENTIFIER : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
INTEGER : '0'..'9'+
;
COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
Then for the input:
command: _SC play Piano 1
command: _SR doSomething someInstrument 2
You will get following parse tree:
Then, when you make your AST grammar you should chek the names of the commands for you commands, for example: if Name Of command == _SC do something
etc...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With