Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.Net Regex that Matches Strings With Any non-ASCII char in it

Looking for some black magic that will match any string with "weird" characters in it. Standard ASCII characters are fine. Everything else isn't.

This is for sanitizing various web forms.

like image 494
John Shedletsky Avatar asked Aug 24 '10 23:08

John Shedletsky


People also ask

Does regex work with Unicode?

RegexBuddy's regex engine is fully Unicode-based starting with version 2.0. 0.

What character in regex is used to match any character except a newline?

What character in regex is used to match any character except a newline? A metacharacter is a symbol with a special meaning inside a regex. The metacharacter dot ( . ) matches any single character except newline \n (same as [^\n] ).

How do I match a character in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Does regex use ASCII?

The regular expression represents all printable ASCII characters. ASCII code is the numerical representation of all the characters and the ASCII table extends from char NUL (Null) to DEL . The printable characters extend from CODE 32 (SPACE) to CODE 126 (TILDE[~]) .


2 Answers

This gets anything out of the ASCII range

[^\x00-\x7F]

There are still some "weird" characters like x00 (NULL), but they are valid ASCII.
For reference, see the ASCII table

like image 133
NullUserException Avatar answered Oct 15 '22 09:10

NullUserException


[^\p{IsBasicLatin}] for what is asked for, [^\x00-\x7F] for concision over self-documentation, or \p{C} for clearing out formatters and controls without hurting other non-ASCIIs (and with greater concision yet).

like image 20
Jon Hanna Avatar answered Oct 15 '22 08:10

Jon Hanna