Looking for some black magic that will match any string with "weird" characters in it. Standard ASCII characters are fine. Everything else isn't.
This is for sanitizing various web forms.
RegexBuddy's regex engine is fully Unicode-based starting with version 2.0. 0.
What character in regex is used to match any character except a newline? A metacharacter is a symbol with a special meaning inside a regex. The metacharacter dot ( . ) matches any single character except newline \n (same as [^\n] ).
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
The regular expression represents all printable ASCII characters. ASCII code is the numerical representation of all the characters and the ASCII table extends from char NUL (Null) to DEL . The printable characters extend from CODE 32 (SPACE) to CODE 126 (TILDE[~]) .
This gets anything out of the ASCII range
[^\x00-\x7F]
There are still some "weird" characters like x00
(NULL), but they are valid ASCII.
For reference, see the ASCII table
[^\p{IsBasicLatin}]
for what is asked for, [^\x00-\x7F]
for concision over self-documentation, or \p{C}
for clearing out formatters and controls without hurting other non-ASCIIs (and with greater concision yet).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With