On extracting some html from a web page, I have some elements containing text that end in an unknown or non-matching whitespace character (ie does not match "\\s"):
<span>Monday </span>
In java, to check what this character is, I am doing:
String s = getTheSpanContent();
char c = s.charAt(s.length() -1);
int i = (int) c;
and the value of i is: 160
Anyone know what this is? And how I can match for it?
Thanks
It's a non-breaking space. According to the Pattern
Javadocs, \\s
matches [ \t\n\x0B\f\r]
, so you'll have to explicitly add \xA0
to your regex if you want to match it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With