Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why the following regex [^0-9!a-zA-z#\\$%&'\\*\\+\\-/=\\?\\^_`\\{\\|\\}~@\\.]+ for String.split is not working?

Tags:

java

regex

I have this regular expression:

[^0-9!a-zA-z#\\$%&'\\*\\+\\-/=\\?\\^_`\\{\\|\\}~@\\.]+

and I am trying to split the email address using

[Email][email protected]

But the following code in java:

String fileStr = "[Email][email protected]";

String invalidCharacters = "[^0-9!a-zA-z#\\$%&'\\*\\+\\-/=\\?\\^_`\\{\\|\\}~@\\.]+";

String[] tokens = fileStr.split(invalidCharacters);

for (String token:tokens) {
    if (token.contains("@")) {
        System.out.println(token);
    }
}

is giving this output:

[Email][email protected]

I am completely clueless as invalidCharacters variable covers [ and ] also.

like image 378
Md. Reazul Karim Avatar asked Dec 21 '22 13:12

Md. Reazul Karim


1 Answers

You have A-z in your character class, and the square bracket characters come between upper case Z and lower case a in ASCII (and Unicode) order. Thus ] is being considered a valid rather than invalid character - presumably you meant A-Z instead.

like image 104
Ian Roberts Avatar answered Feb 16 '23 00:02

Ian Roberts