Why the following code not working(resulting false) with Indian languages?
System.out.println(Charset.forName("UTF-8").encode("అనువాద")
.asCharBuffer().toString().matches("\\p{L}+"));
System.out.println(Charset.forName("UTF-8").encode("स्वागत")
.asCharBuffer().toString().matches("\\p{L}+"));
System.out.println(Charset.forName("UTF-8").encode("நல்வரவு")
.asCharBuffer().toString().matches("\\p{L}+"));
All the above code returns false. What is the problem with this regex? How to validate any unicode character in the world?
\p{Letter}
only captures letters but you also need marks, which you can capture with \p{Mark}
.
System.out.println("स्वागत".matches("[\\pL\\pM]+"));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With