I want to define a grammar where the lexer rule for an identifier ID should only successfully match if is valid. Such identifiers are provided in a set to the lexer.
Based on this answer here I came up with the following solution. I add another lexer constructor, passing in the set of valid identifiers. Then I added a predicate after the ID token rule to disable it if the matched text is not found. The lexer would then match the UNKNOWN token (same rule) and I was hoping to force a NoViableAltException, as there is no rule for expr containing UNKNOWN. Instead I get MismatchedInputExceptions which may be fine, but all the following syntax errors are no longer reported.
grammar T;
@lexer::members {
private java.util.Set<String> identifiers = new java.util.HashSet<>();
public TLexer(CharStream input, java.util.Set<String> s) {
this(input);
this.identifiers = s;
}
public boolean hasIdentifier(String s) {
return identifiers.contains(s);
}
}
prog: expr;
expr
: expr ('AND' | 'OR') expr
| ID '=' STRING
| ID ('=' | '<' | '>') INT
;
INT: ...
STRING: ...
ID: [a-z][a-zA-Z0-9]* {hasIdentifier(getText())}?;
UNKNOWN: [a-z][a-zA-Z0-9]*;
WS: [ \r\n\t] -> skip();
Is there a better way so that any invalid identifiers are reported as actual syntax errors?
You will get other syntax errors only if the parser can recover from the first error. That ability depends heavily on the grammar. If no re-sync is possible the parser will eventually give up and never report more errors.
Therefore I wouldn't try to get more errors. Instead report one to the user and let him fix that, before continuing with further errors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With