In Java i'm looking for a regular expression that accepts any Persian( or Arabic ) letters except any Persian ( or Arabic) numbers. In order to have only letters i found a very good regular expression:
[\u0600-\u065F\u066A-\u06EF\u06FA-\u06FF]
although it is true and works for me, But we know that we can use the \\p{L}+
as a regular expression which accepts all letters from all languages in the world, and in my case ( Arabic - Persian ) i can modified it and use [\\p{InArabic}]+$.
But by using [\\p{InArabic}]+$
not only all Arabic(Persian) letters are going to be accepted but also Arabic numbers are acceptable too, like ۱ ۲.
So my question is how can i modify [\\p{InArabic}]+$
to just accept letters not numbers, or in other word how can i restrict [\\p{InArabic}]+$
to not accept any numbers?
Please Notice that the Persian(Arabic) numbers are like these: ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ ۰
You can use the following regex:
"[\\p{InArabic}&&\\PN]"
\p{InArabic}
matches any character in Unicode Block Arabic (from U+0600 to U+06FF)
\PN
matches any character not belonging to any of the Number category (note the capital P
).
Intersecting the 2 sets give the desired result: both digit ranges (U+0660 to U+0669) and (U+06F0 to U+06F9) are excluded.
for (int i = 0x600; i <= 0x6ff; i++) {
String c = "" + (char) i;
System.out.println(Integer.toString(i, 16) + " " + c.matches("[\\p{InArabic}&&\\PN]"));
}
You can use character class subtraction, which is a rather obscure feature:
[\p{InArabic}&&[^۰-۹]]
Working example: http://ideone.com/jChGem
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With