Let's say that I want to match "beer", but don't care about case sensitivity.
Currently I am defining a token to be ('b'|'B' 'e'|'E' 'e'|'E' 'r'|'R') but I have a lot of such and don't really want to handle 'verilythisisaverylongtokenindeedomyyesitis'.
The antlr wiki seems to suggest that it can't be done (in antlr) ... but I just wondered if anyone had some clever tricks ...
The i modifier is used to perform case-insensitive matching. For example, the regular expression /The/gi means: uppercase letter T , followed by lowercase character h , followed by character e . And at the end of regular expression the i flag tells the regular expression engine to ignore the case.
The equalsIgnoreCase() method of the String class is similar to the equals() method the difference if this method compares the given string to the current one ignoring case.
Using the casefold() method is the strongest and the most aggressive approach to string comparison in Python. It's similar to lower() , but it removes all case distinctions in strings. This is a more efficient way to make case-insensitive comparisons in Python.
Case-Sensitive Match To disable case-sensitive matching for regexp , use the 'ignorecase' option. For multiple expressions, enable and disable case-insensitive matching for selected expressions using the (? i) and (?-i) search flags.
I would like to add to the accepted answer: a ready -made set can be found at case insensitive antlr building blocks, and the relevant portion included below for convenience
fragment A:[aA];
fragment B:[bB];
fragment C:[cC];
fragment D:[dD];
fragment E:[eE];
fragment F:[fF];
fragment G:[gG];
fragment H:[hH];
fragment I:[iI];
fragment J:[jJ];
fragment K:[kK];
fragment L:[lL];
fragment M:[mM];
fragment N:[nN];
fragment O:[oO];
fragment P:[pP];
fragment Q:[qQ];
fragment R:[rR];
fragment S:[sS];
fragment T:[tT];
fragment U:[uU];
fragment V:[vV];
fragment W:[wW];
fragment X:[xX];
fragment Y:[yY];
fragment Z:[zZ];
So an example is
HELLOWORLD : H E L L O W O R L D;
How about define a lexer token for each permissible identifier character, then construct the parser token as a series of those?
beer: B E E R;
A : 'A'|'a';
B: 'B'|'b';
etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With