Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't char of type SPACE_SEPARATOR recognized as whitespace?

I have String like "12 345 678" and I wanted to remove whitespaces (because of conversion to int). So I did the usual: myString.replaceAll("\\s", "");, but what a surprise! It did nothing, the space was still there.

When I investigated further, I figured out that this space character is of type Character.SPACE_SEPARATOR (Character.getType(myString.charAt(<positionOfSpaceChar>))).

What I don't get is why isn't this oblivious space character (from Unicode category Zs http://www.fileformat.info/info/unicode/category/Zs/list.htm) recognized as whitespace (not even with Character.isWhitespace(char)).

Reading through java api isn't helpful (so far).

note: In the end, I just want to remove that character... and I will probably find a way how to do it, but I'm really interested in some explanation of why it's behaving like this. Thanks

like image 515
rax Avatar asked Jan 13 '23 03:01

rax


1 Answers

Your problem is that \s is defined as [ \t\n\x0B\f\r]. What you want to use is \p{javaWhitespace}, which is defined as all characters for which java.lang.Character.isWhitespace() is true.

Not sure if it applies in this case, but note that a non-breaking space is not considered whitespace. Character.SPACE_SEPARATOR is generally whitespace, but '\u00A0', '\u2007', '\u202F' are not included because they are non-breaking. If you want to include non-breaking spaces, then include those 3 characters explicitly in addition to \p{javaWhitespace}. It's kind of a pain, but that's the way it is.

Actually, in your specific case of converting to int, I'd recommend:

myString.replaceAll("\\D", "");,

to strip out everything that is not a digit.

like image 108
Old Pro Avatar answered Jan 25 '23 21:01

Old Pro