Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for persian(arabic) letters without any numbers

In Java i'm looking for a regular expression that accepts any Persian( or Arabic ) letters except any Persian ( or Arabic) numbers. In order to have only letters i found a very good regular expression:

[\u0600-\u065F\u066A-\u06EF\u06FA-\u06FF]

although it is true and works for me, But we know that we can use the \\p{L}+ as a regular expression which accepts all letters from all languages in the world, and in my case ( Arabic - Persian ) i can modified it and use [\\p{InArabic}]+$.

But by using [\\p{InArabic}]+$ not only all Arabic(Persian) letters are going to be accepted but also Arabic numbers are acceptable too, like ۱ ۲.

So my question is how can i modify [\\p{InArabic}]+$ to just accept letters not numbers, or in other word how can i restrict [\\p{InArabic}]+$ to not accept any numbers?

Please Notice that the Persian(Arabic) numbers are like these: ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ ۰

like image 707
Elyas Hadizadeh Avatar asked May 08 '15 04:05

Elyas Hadizadeh


2 Answers

You can use the following regex:

"[\\p{InArabic}&&\\PN]"

\p{InArabic} matches any character in Unicode Block Arabic (from U+0600 to U+06FF)

\PN matches any character not belonging to any of the Number category (note the capital P).

Intersecting the 2 sets give the desired result: both digit ranges (U+0660 to U+0669) and (U+06F0 to U+06F9) are excluded.

Testing code

for (int i = 0x600; i <= 0x6ff; i++) {
    String c = "" + (char) i;
    System.out.println(Integer.toString(i, 16) + " " + c.matches("[\\p{InArabic}&&\\PN]"));
}
like image 66
nhahtdh Avatar answered Nov 04 '22 22:11

nhahtdh


You can use character class subtraction, which is a rather obscure feature:

[\p{InArabic}&&[^۰-۹]]

Working example: http://ideone.com/jChGem

like image 7
Kobi Avatar answered Nov 04 '22 21:11

Kobi