Please see the source code available at: https://gist.github.com/1684022.
I've got two tokens defined:
ID : ('a'..'z' | 'A'..'Z') ('0'..'9' | 'a'..'z' | 'A'..'Z' | ' ')*;
PITCH
: (('A'|'a') '#'?)
| (('B'|'b') '#'?)
| (('C'|'c') '#'?);
Obviously, the letter "A" would be an ambiguity.
I further define:
note : PITCH;
name : ID;
main : name ':' note '\n'?
Now, if I enter "A:A" as input to the parser, I always get an error. Either the parser expects PITCH or ID depending on whether ID or PITCH is defined first:
mismatched input 'A' expecting ID
What is the proper way to resolve this so that it works as intended?
As is described, although it makes intuitive sense how the parsing should work, ANTLR doesn't do the "right thing". That is, even though the main
rule says a name
/ID
should come first, the lexer seems to be ignorant of this and identifies "A" as a PITCH
because it follows the "longest match"/"which comes first" rule rather than the more reasonable "what the rule says" rule.
Is the only solution to fake/hack it by matching both ID and PITCH, and then recombining them later as dasblinkenlight says?
Here is how I would re-factor this grammar to make it work:
ID : (('a'..'z' | 'A'..'Z') ('0'..'9' | 'a'..'z' | 'A'..'Z' | ' ')+)
| ('d'..'z' | 'D'..'Z');
PITCH : 'a'..'c' | 'A'..'C';
SHARP : '#';
note : PITCH SHARP?;
name : ID | PITCH;
main : name ':' note '\n'? EOF
This separates long names from one-character pitch names, which get "reunited" in the parser. Also the "sharp" token gets its own name, and gets recognized in the parser as an optional token.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With