Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regex : Case insensitive matching for non English characters

Tags:

java

regex

locale

I am trying to perform case insensitive matching with Pattern and Matcher classes in Java, for Russian language. Below is the text:

"some text газированных напитков some other text"

Below is the Pattern I am using to match the text:

Pattern pattern = Pattern.compile("(?iu)\\b(" + Pattern.quote("напитки") + ")\\b", Pattern.UNICODE_CHARACTER_CLASS);

I am expecting the following to return true as it's a case insensitive comparison (напитки vs напитков):

System.out.println(pattern.matcher("some text газированных напитков some other text").find());

But it always returns false. I have tried with other Pattern constants (like CASE_INSENSITIVE, UNICODE_CASE, CANON_EQ), however, it still returns false.

Is there any way in Java to perform such comparison? Is it even possible at all?

like image 852
Darshan Mehta Avatar asked Dec 03 '25 14:12

Darshan Mehta


1 Answers

Just add this option in your Pattern:

Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE

This worked in all my cases for cyrrilic. And I use it really extensively.

like image 68
Vitaliy Avatar answered Dec 06 '25 05:12

Vitaliy