[^\x20-\x7E]
I saw this pattern used for a regular expression in which the goal was to remove non-ascii characters from a string. What does it mean?
\x20. Matches an ASCII character using hexadecimal representation (exactly two digits). \cC. Matches an ASCII control character. For example, \cCis control-C.
'?' is also a quantifier. Is short for {0,1}. It means "Match zero or one of the group preceding this question mark." It can also be interpreted as the part preceding the question mark is optional. e.g.: pattern = re.compile(r'(\d{2}-)?\
*. * , returns strings beginning with any combination and any amount of characters (the first asterisk), and can end with any combination and any amount of characters (the last asterisk). This selects every single string available.
Each character in a regular expression (that is, each character in the string describing its pattern) is either a metacharacter, having a special meaning, or a regular character that has a literal meaning.
It says something like: all characters that are not (^
) in the range \x20-\x7E
(hex 0x20
to 0x7E
).
According to http://www.asciitable.com/, those are characters from space to ~
.
It means match any characters that are not printing characters.
Printing characters include a to z, A to Z, 0 to 9 and symbols such as ",;$#% etc.
^ not \x20 hex code for space character - to \x7e hex code for ~ (tilde) character
All the ascii printing characters fall between these two.
This statement matches non ascii characters as well as ascii control (non printing) characters such as bell, tab, null and others.
Look at
man ascii
on a unix system to see which characters it matches.
In perl, you could also write this as
[^ -~]
or
[[:^cntrl:]]
This last one is slightly different, in that it matches any non control character, including extended ascii (e.g. accented characters) and unicode.
You may not want to restrict yourself to just ascii, since non US locations often use valid printing characters outside this small range, e.g. øüéåç...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With