Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

match ASCII characters except alphanumeric

Tags:

java

regex

I have a question that come to my mind when I answered this post to match ASCII characters except alphanumeric.

This is what I have tried but it's not correct.

(?=[\x00-\x7F])[^a-zA-Z0-9]

regex101 demo

I am not looking for solution, just want to know, where I am wrong. What is the meaning of this regex pattern?

Thanks


As per my understanding (?=[\x00-\x7F]) is used to check for ASCII character and [^a-zA-Z0-9] is used to exclude alphanumeric character. So finally it will match any ASCII character except alphanumeric. Am I right?

like image 537
Braj Avatar asked Aug 17 '14 13:08

Braj


People also ask

How do you check if a character is alphanumeric or not?

The isalnum() method returns True if all the characters are alphanumeric, meaning alphabet letter (a-z) and numbers (0-9). Example of characters that are not alphanumeric: (space)!

Is Ascii code alphanumeric?

The American Standard Code for Information Interchange (ASCII) is the standard alphanumeric code for keyboards and a host of other data interchange tasks. Letters, numbers, and single keystroke commands are represented by a seven-bit word.

What are non-alphanumeric characters?

Non-alphanumeric characters are characters that are not numbers (0-9) or alphabetic characters.

How do you replace non-alphanumeric characters with empty strings?

The approach is to use the String. replaceAll method to replace all the non-alphanumeric characters with an empty string.


1 Answers

The regex parser goes to each character in the string and checks it with the regex.

The first part, (?=...), is called a 'lookahead', and it asks if the next character is whatever specified (that is, [\x00-\x7F]). It doesn't move the character pointer.

The next part is saying that the next character is not alphanumeric, but does move the character pointer.

So it does precisely what you told it to; that is, match any non-alphanumeric ASCII character.

It does not match £ in ££££A$££0#$% because £ is not ASCII. If you want to match ANY character that is non-alphanumeric, you're probably looking for this regex:

`[^a-zA-Z0-9]`

See http://www.regular-expressions.info/lookaround.html and other pages on the site for more info.

like image 178
oink Avatar answered Sep 29 '22 00:09

oink