Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match a string, but case-insensitively?

Tags:

Let's say that I want to match "beer", but don't care about case sensitivity.

Currently I am defining a token to be ('b'|'B' 'e'|'E' 'e'|'E' 'r'|'R') but I have a lot of such and don't really want to handle 'verilythisisaverylongtokenindeedomyyesitis'.

The antlr wiki seems to suggest that it can't be done (in antlr) ... but I just wondered if anyone had some clever tricks ...

like image 459
Mawg says reinstate Monica Avatar asked Dec 04 '09 02:12

Mawg says reinstate Monica


People also ask

How do you match a word with a case-insensitive in RegEx?

The i modifier is used to perform case-insensitive matching. For example, the regular expression /The/gi means: uppercase letter T , followed by lowercase character h , followed by character e . And at the end of regular expression the i flag tells the regular expression engine to ignore the case.

How do you compare strings case-insensitive?

The equalsIgnoreCase() method of the String class is similar to the equals() method the difference if this method compares the given string to the current one ignoring case.

How do you match a case-insensitive in Python?

Using the casefold() method is the strongest and the most aggressive approach to string comparison in Python. It's similar to lower() , but it removes all case distinctions in strings. This is a more efficient way to make case-insensitive comparisons in Python.

How do you ignore case sensitive in RegEx?

Case-Sensitive Match To disable case-sensitive matching for regexp , use the 'ignorecase' option. For multiple expressions, enable and disable case-insensitive matching for selected expressions using the (? i) and (?-i) search flags.


2 Answers

I would like to add to the accepted answer: a ready -made set can be found at case insensitive antlr building blocks, and the relevant portion included below for convenience

fragment A:[aA];
fragment B:[bB];
fragment C:[cC];
fragment D:[dD];
fragment E:[eE];
fragment F:[fF];
fragment G:[gG];
fragment H:[hH];
fragment I:[iI];
fragment J:[jJ];
fragment K:[kK];
fragment L:[lL];
fragment M:[mM];
fragment N:[nN];
fragment O:[oO];
fragment P:[pP];
fragment Q:[qQ];
fragment R:[rR];
fragment S:[sS];
fragment T:[tT];
fragment U:[uU];
fragment V:[vV];
fragment W:[wW];
fragment X:[xX];
fragment Y:[yY];
fragment Z:[zZ];

So an example is

   HELLOWORLD : H E L L O W O R L D;
like image 100
WestCoastProjects Avatar answered Oct 14 '22 21:10

WestCoastProjects


How about define a lexer token for each permissible identifier character, then construct the parser token as a series of those?

beer: B E E R;

A : 'A'|'a';
B: 'B'|'b';

etc.

like image 25
Jonathan Feinberg Avatar answered Oct 14 '22 20:10

Jonathan Feinberg