Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.NET Regular Expression to match any kind of letter from any language

Which regular expression can I use to match (allow) any kind of letter from any language?

I need to match any letter including any diacritics (e.g., á, ü, ñ) and exclude any kind of symbol (math symbols, currency signs, dingbats, box-drawing characters, etc.) and punctuation characters.

I'm using ASP.NET MVC 2 with .NET 4. I’ve tried this annotation in my view model

[RegularExpression(@"\p{L}*", ...

and this one

[RegularExpression(@"\p{L}\p{M}*", ...

but client-side validation rejects accented characters.

UPDATE: Thank you for all your answers. Your suggestions work but only for .NET, and the problem here is that it also uses the regex for client-side validation with JavaScript.

I had to go with

[^0-9_\|°¬!#\$%/\\\(\)\?¡¿\+\{\}\[\]:\.\,;@ª^\*<>=&]

which is very ugly and does not cover all scenarios but is the closest thing to what I need.

like image 942
pedro Avatar asked Jun 01 '10 12:06

pedro


People also ask

How do I match a letter in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Does regex work for other languages?

Regular expressions are easy to learn, self-containing (its syntax is rarely changed or updated), very powerful and language agnostic, since they work for all natural languages and with majority of programming languages.

Which regex is applicable for alphabets?

[A-Za-z] will match all the alphabets (both lowercase and uppercase).

What regular expression would you use to match a single character?

Use square brackets [] to match any characters in a set. Use \w to match any single alphanumeric character: 0-9 , a-z , A-Z , and _ (underscore). Use \d to match any single digit. Use \s to match any single whitespace character.


2 Answers

\p{L}* should match "any kind of letter from any language". It should work, I used it in a i18n-proof uppercase/lowercase recognition regex in .NET.

like image 136
Jan Willem B Avatar answered Oct 20 '22 01:10

Jan Willem B


Your problem is more likely to the fact that you will only have to have one alpha-char, because the regex will match anything that has at least one char.

By adding ^ as prefix and $ as postfix, the whole sentence should comply to your regex. So this prob works:

^\p{L}*$

Regexbuddy explains:

  1. ^ Assert position at beginning of the string
  2. \p{L} A character with the Unicode property 'letter' (any kind of letter from any kind of language) 2a. Between zero and unlimited times, as many as possible (greedy)
  3. $ Assert position at the end of the string
like image 44
Jan Jongboom Avatar answered Oct 20 '22 00:10

Jan Jongboom