Java REGEX code to validate Indian language characters not working?

Question

Why the following code not working(resulting false) with Indian languages?

System.out.println(Charset.forName("UTF-8").encode("అనువాద")
                .asCharBuffer().toString().matches("\p{L}+"));

System.out.println(Charset.forName("UTF-8").encode("स्वागत")
                .asCharBuffer().toString().matches("\p{L}+"));

System.out.println(Charset.forName("UTF-8").encode("நல்வரவு")
                .asCharBuffer().toString().matches("\p{L}+"));

All the above code returns false. What is the problem with this regex? How to validate any unicode character in the world?

Youssef Oujamaa · Accepted Answer

\p{Letter} only captures letters but you also need marks, which you can capture with \p{Mark}.

System.out.println("स्वागत".matches("[\pL\pM]+"));

Java REGEX code to validate Indian language characters not working?

Tags:

java

regex

unicode

utf-8

Suren Raju

1 Answers

Youssef Oujamaa

Recent Activity

Donate For Us

Java REGEX code to validate Indian language characters not working?

Tags:

java

regex

unicode

utf-8

Suren Raju

1 Answers

Youssef Oujamaa

Related questions

Recent Activity

Donate For Us