I have String like "12 345 678" and I wanted to remove whitespaces (because of conversion to int). So I did the usual: myString.replaceAll("\\s", "");
, but what a surprise! It did nothing, the space was still there.
When I investigated further, I figured out that this space character is of type Character.SPACE_SEPARATOR (Character.getType(myString.charAt(<positionOfSpaceChar>))
).
What I don't get is why isn't this oblivious space character (from Unicode category Zs
http://www.fileformat.info/info/unicode/category/Zs/list.htm) recognized as whitespace (not even with Character.isWhitespace(char)
).
Reading through java api isn't helpful (so far).
note: In the end, I just want to remove that character... and I will probably find a way how to do it, but I'm really interested in some explanation of why it's behaving like this. Thanks
Your problem is that \s
is defined as [ \t\n\x0B\f\r]
. What you want to use is \p{javaWhitespace}
, which is defined as all characters for which java.lang.Character.isWhitespace()
is true.
Not sure if it applies in this case, but note that a non-breaking space is not considered whitespace. Character.SPACE_SEPARATOR
is generally whitespace, but '\u00A0', '\u2007', '\u202F' are not included because they are non-breaking. If you want to include non-breaking spaces, then include those 3 characters explicitly in addition to \p{javaWhitespace}
. It's kind of a pain, but that's the way it is.
Actually, in your specific case of converting to int
, I'd recommend:
myString.replaceAll("\\D", "");,
to strip out everything that is not a digit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With