Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode characters in Regex

Tags:

c#

.net

regex

I have a regular expression:

return Regex.IsMatch(_customer.FirstName, @"^[A-Za-z][A-Za-z0-9@#%&\'\-\s\.\,*]*$");

Now, some of the customers have a fada over a vowel in their surname or firstname like the following: Brendán

Note the fada over the a which you can get by holding down alt, ctrl and then pressing a.

I have tried adding these characters into the regular expression but I get an error when the program tries to compile.

The only way I can allow the user to enter such a character with a a fada is to remove the regular expression completely which means the user can enter anything they want.

Is there any way to use the above expression and somehow allow the following characters?

á
é
í
ó
ú
like image 850
Kev Avatar asked Dec 17 '13 17:12

Kev


People also ask

What is Unicode in RegEx?

Unicode Regular Expressions. Unicode is a character set that aims to define all characters and glyphs from all human languages, living and dead. With more and more software being required to support multiple languages, or even just any language, Unicode has been strongly gaining popularity in recent years.

What characters are allowed in RegEx?

A regex consists of a sequence of characters, metacharacters (such as . , \d , \D , \ s, \S , \w , \W ) and operators (such as + , * , ? , | , ^ ).

What does '$' mean in RegEx?

$ means "Match the end of the string" (the position after the last character in the string).


1 Answers

Just for reference you don't need to escape the above ',. in your character class [], and you can avoid having to escape the dash - by placing it at the beginning or end of your character class.

You can use \p{L} which matches any kind of letter from any language. See the example below:

string[] names = { "Brendán", "Jóhn", "Jason" };
Regex rgx      = new Regex(@"^\p{L}+$");
foreach (string name in names)
    Console.WriteLine("{0} {1} a valid name.", name, rgx.IsMatch(name) ? "is" : "is not");

// Brendán is a valid name.
// Jóhn is a valid name.
// Jason is a valid name.

Or simply just add the desired characters to your character class [] you want to include.

@"^[a-zA-Z0-9áéíóú@#%&',.\s-]+$"
like image 89
hwnd Avatar answered Oct 09 '22 04:10

hwnd