Recently ran into a very odd issue where my database contains strings with what appear to be normal whitespace characters but are in fact something else.
For instance, applying trim()
to the string:
"TEST "
is getting me:
"TEST "
as a result. So I copy and paste the last character in the string and:
echo ord(' ');
194
194? According to ASCII tables that should be ┬
. So I'm just confused at this point. Why does this character appear to be whitespace and how can I trim()
characters like this when trim()
fails?
Letter O with circumflex accent or O-circumflex.
The ASCII code for a blank space is the decimal number 32, or the binary number 0010 00002.
It's more likely to be a two-byte 194
160
sequence, which is the UTF-8 encoding of a NO-BREAK SPACE codepoint (the equivalent of the
entity in HTML).
It's really not a space, even though it looks like one. (You'll see it won't word-wrap, for instance.) A regular expression match for \s would match it, but a plain comparison with a space won't; nor will trim()
remove it.
To replace NO-BREAK spaces with a normal space, you should be able to do something like:
$string = str_replace("\u{c2a0}", " ", $string);
or
$string = str_replace("\u{c2a0}", "", $string);
to remove them
You can try with :
PHP trim
$foo = "TEST ";
$foo = trim($foo);
PHP str_replace
$foo = "TEST ";
$foo = str_replace(chr(194), '', $foo);
IMPORTANT: You can try with
chr(194).chr(160)
or'\u00A0'
PHP preg_replace
$foo = "TEST ";
$foo = preg_replace('#(^\s+|\s+$)#', '', $foo);
OR (i'm not sure if it will work well)
$foo = "TEST ";
$foo = preg_replace('#[\xC2\xA0]#', '', $foo);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With