Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for non-alphabets and non-numerals

Tags:

c#

regex

Please provide a solution to write a regular expression as following in C#.NET: I would require a RegEx for Non-Alphabets(a to z;A to Z) and Non-Numerals(0 to 9). Mean to say as reverse way for getting regular expression other than alphabets and otherthan numerals(0 to 9).

Kindly suggest the solution for the same.

like image 257
sukumar Avatar asked Nov 18 '09 09:11

sukumar


2 Answers

You can use a negated character class here:

[^a-zA-Z0-9]

Above regex will match a single character which can't be a latin lowercase or uppercase letter or a number.

The ^ at the start of the character class (the part between [ and ]) negates the complete class so that it matches anything not in the class, instead of normal character class behavior.

To make it useful, you probably want one of those:

  1. Zero or more such characters

    [^a-zA-Z0-9]*
    

    The asterisk (*) here signifies that the preceding part can be repeated zero or more times.

  2. One or more such characters

    [^a-zA-Z0-9]+
    

    The plus (+) here signifies that the preceding part can be repeated one or more times.

  3. A complete (possibly empty) string, consisting only of such characters

    ^[^a-zA-Z0-9]*$
    

    Here the characters ^ and $ have a meaning as anchors, matching the start and end of the string, respectively. This ensures that the entire string consists of characters not in that character class and no other characters come before or after them.

  4. A complete (non-empty) string, consisting only of such characters

    ^[^a-zA-Z0-9]+$
    

Elaborating a bit, this won't (and can't) make sure that you won't use any other characters, possibly from other scripts. The string аеΒ would be completely valid with the above regular expression, because it uses letters from Greek and Cyrillic. Furthermore there are other pitfalls. The string á will pass above regular expression, while the string ́a will not (because it constructs the letter á from the letter a and a combining diacritical mark).

So negated character classes have to be taken with care at times.

I can also use numerals from other scripts, if I wanted to: ١٢٣ :-)

You can use the character class

[^\p{L&}\p{Nd}]

if you need to take care of the above things.

like image 184
Joey Avatar answered Oct 04 '22 22:10

Joey


just negate the class:

[^A-Za-z0-9]

like image 38
beggs Avatar answered Oct 04 '22 23:10

beggs