I work with MVC and I am new on it. I want to check input values is only in Persian language (Characters) by [RegularExpression]
Validation.
So I think to use Regex and need to check in range of unicodes, but I don't lnow how can find range of Persian characters Unicode. Am I right about this Regex? what is your suggestion and how can I find range of Unicode in Persian
Persian characters are within the range: [\u0600-\u06FF]
Try:
Regex.IsMatch(value, @"^[\u0600-\u06FF]+$")
Check first letter and last letter range in Persian I think something like this:
"^[آ-ی]$"
Regex.IsMatch(Text, @"^([\u0600-\u06FF]+\s?)+$")
This Only Contain standard Arabic symbols range But Persian also include 4 More Characters:
ژ \uFB8A
پ \u067E
چ \u0686
گ \u06AF
So You Should Use:
^[\u0600-\u06FF\uFB8A\u067E\u0686\u06AF]+$
If you want to match Zero-width-non-joiner you should add this too:
\u200C
\u0600-\u06FF
or [آ-ی]
are simply WRONG.i.e.
\u0600-\u06FF
contains 209 more characters than you need! and it includes numbers too!
Use ^[آابپتثجچحخدذرزژسشصضطظعغفقکگلمنوهی]+$
for letters.
Use ^[۰۱۲۳۴۵۶۷۸۹]+$
for numbers.
Use [ ٌ ًّ َ ِ ُ ْ ]
for vowels.
Or a union of those. You may want to add other Arabic letters like Hamza ء
to your character set additionally.
This answer exists to fix a common misconception. Codepoints 0600
through 06FF
do not denote Persian / Farsi alphabet (neither does [آ-ی]
):
[\u0600-\u0605 ؐ-ؚ\u061Cـ ۖ-\u06DD ۟-ۤ ۧ ۨ ۪-ۭ ً-ٕ ٟ ٖ-ٞ ٰ ، ؍ ٫ ٬ ؛ ؞ ؟ ۔ ٭ ٪ ؉ ؊ ؈ ؎ ؏
۞ ۩ ؆ ؇ ؋ ٠۰ ١۱ ٢۲ ٣۳ ٤۴ ٥۵ ٦۶ ٧۷ ٨۸ ٩۹ ءٴ۽ آ أ ٲ ٱ ؤ إ ٳ ئ ا ٵ ٮ ب ٻ پ ڀ
ة-ث ٹ ٺ ټ ٽ ٿ ج ڃ ڄ چ ڿ ڇ ح خ ځ ڂ څ د ذ ڈ-ڐ ۮ ر ز ڑ-ڙ ۯ س ش ښ-ڜ ۺ ص ض ڝ ڞ
ۻ ط ظ ڟ ع غ ڠ ۼ ف ڡ-ڦ ٯ ق ڧ ڨ ك ک-ڴ ػ ؼ ل ڵ-ڸ م۾ ن ں-ڽ ڹ ه ھ ہ-ۃ ۿ ەۀ وۥ ٶ
ۄ-ۇ ٷ ۈ-ۋ ۏ ى يۦ ٸ ی-ێ ې ۑ ؽ-ؿ ؠ ے ۓ \u061D]
255 characters are fallen in this range, Farsi alphabet has 32 letters that in addition to Farsi demonstration of digits it would be 42. If we add vowels (Arabic vowels originally, that rarely used in Farsi) and Tanvin (ً
, ٍِ
, ٌ
) and Tashdid (ّ
) that are both a subset of Arabic diacritics not Farsi, we'd end with 46 characters. This means:
\u0600-\u06FF
contains 209 more characters than you need!۷
with codepoint 06F7
is a Farsi representation of number 7
and ٧
with codepoint 0667
is Arabic representation of the same number. ۶
is Farsi representation of number 6
and ٦
is Arabic representation of the same number. And all reside in 0600
through 06FF
codepoints.
The shapes of the Persian digits four (
۴
), five (۵
), and six (۶
) are different from the shapes used in Arabic and the other numbers have different codepoints.
You can see different number of other characters that doesn't exist in Farsi / Persian too and nobody is willing to have them while validating a first name or surname.
[آ-ی]
includes 117 characters too which is much more than what someone needs for validation. You can see them all using Unicode CLDR.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With