Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

form validation allow only english alphabet characters

I'd like to restrict my form input from entering non-english characters. For example, all Chinese, Japanese, Cyrllic, but also single characters like: à, â, ù, û, ü, ô, î, ê. Would this be possible? Do I have to set up a locale on my MVC application or rather just do a regex textbox validation? Just a side note, I want to be able to enter numbers and other characters. I only want this to exclude letters.

Please advice, thank you

like image 228
bobek Avatar asked Mar 08 '13 23:03

bobek


2 Answers

For this you have to use Unicode character properties and blocks. Each Unicode code points has assigned some properties, e.g. this point is a Letter. Blocks are code point ranges.

For more details, see:

  • regular-expressions.info for some general information about Unicode code points, character properties, scripts and blocks

  • MSDN for the supported properties and blocks in .net

Those Unicode Properties and blocks are written \p{Name}, where "Name" is the name of the property or block.

When it is an uppercase "P" like this \P{Name}, then it is the negation of the property/block, i.e. it matches anything else.

There are e.g. some properties (only a short excerpt):

  • L ==> All letter characters.
  • Lu ==> Letter, Uppercase
  • Ll ==> Letter, Lowercase
  • N ==> All numbers. This includes the Nd, Nl, and No categories.
  • Pc ==> Punctuation, Connector
  • P ==> All punctuation characters. This includes the Pc, Pd, Ps, Pe, Pi, Pf, and Po categories.
  • Sm ==> Symbol, Math

There are e.g. some blocks (only a short excerpt):

  • 0000 - 007F ==> IsBasicLatin
  • 0400 - 04FF ==> IsCyrillic
  • 1000 - 109F ==> IsMyanmar

What I used in the solution:

\P{L} is a character property that is matching any character that is not a letter ("L" for Letter)

\p{IsBasicLatin} is a Unicode block that matches the code points 0000 - 007F

So your regex would be:

^[\P{L}\p{IsBasicLatin}]+$

In plain words:

This matches a string from the start to the end (^ and $), When there are (at least one) only non letters or characters from the ASCII table (doce points 0000 - 007F)

A short c# test method:

string[] myStrings = { "Foobar",
    "Foo@bar!\"§$%&/()",
    "Föobar",
    "fóÓè"
};

Regex reg = new Regex(@"^[\P{L}\p{IsBasicLatin}]+$");

foreach (string str in myStrings) {
    Match result = reg.Match(str);
    if (result.Success)
        Console.Out.WriteLine("matched ==> " + str);
    else
        Console.Out.WriteLine("failed ==> " + str);
}

Console.ReadLine();

Prints:

matched ==> Foobar
matched ==> Foo@bar!\"§$%&/()
failed ==> Föobar
failed ==> fóÓè

like image 179
stema Avatar answered Oct 15 '22 21:10

stema


You can use a Regular Expression attribute on your ViewModel to restrict that

public class MyViewModel
{
    [System.ComponentModel.DataAnnotations.RegularExpression("[a-zA-Z]+")]
    public string MyEntry
    {
       get;
       set;
    }
}
like image 40
codingbiz Avatar answered Oct 15 '22 20:10

codingbiz