Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java REGEX code to validate Indian language characters not working?

Why the following code not working(resulting false) with Indian languages?

System.out.println(Charset.forName("UTF-8").encode("అనువాద")
                .asCharBuffer().toString().matches("\\p{L}+"));

System.out.println(Charset.forName("UTF-8").encode("स्वागत")
                .asCharBuffer().toString().matches("\\p{L}+"));

System.out.println(Charset.forName("UTF-8").encode("நல்வரவு")
                .asCharBuffer().toString().matches("\\p{L}+"));

All the above code returns false. What is the problem with this regex? How to validate any unicode character in the world?

like image 525
Suren Raju Avatar asked May 02 '13 10:05

Suren Raju


1 Answers

\p{Letter} only captures letters but you also need marks, which you can capture with \p{Mark}.

System.out.println("स्वागत".matches("[\\pL\\pM]+"));
like image 119
Youssef Oujamaa Avatar answered Oct 02 '22 21:10

Youssef Oujamaa