I have a question that come to my mind when I answered this post to match ASCII characters except alphanumeric.
This is what I have tried but it's not correct.
(?=[\x00-\x7F])[^a-zA-Z0-9]
regex101 demo
I am not looking for solution, just want to know, where I am wrong. What is the meaning of this regex pattern?
Thanks
As per my understanding (?=[\x00-\x7F])
is used to check for ASCII character and [^a-zA-Z0-9]
is used to exclude alphanumeric character. So finally it will match any ASCII character except alphanumeric. Am I right?
The isalnum() method returns True if all the characters are alphanumeric, meaning alphabet letter (a-z) and numbers (0-9). Example of characters that are not alphanumeric: (space)!
The American Standard Code for Information Interchange (ASCII) is the standard alphanumeric code for keyboards and a host of other data interchange tasks. Letters, numbers, and single keystroke commands are represented by a seven-bit word.
Non-alphanumeric characters are characters that are not numbers (0-9) or alphabetic characters.
The approach is to use the String. replaceAll method to replace all the non-alphanumeric characters with an empty string.
The regex parser goes to each character in the string and checks it with the regex.
The first part, (?=...)
, is called a 'lookahead', and it asks if the next character is whatever specified (that is, [\x00-\x7F]
). It doesn't move the character pointer.
The next part is saying that the next character is not alphanumeric, but does move the character pointer.
So it does precisely what you told it to; that is, match any non-alphanumeric ASCII character.
It does not match £
in ££££A$££0#$%
because £
is not ASCII. If you want to match ANY character that is non-alphanumeric, you're probably looking for this regex:
`[^a-zA-Z0-9]`
See http://www.regular-expressions.info/lookaround.html and other pages on the site for more info.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With