The result from the Google+ API has \ufeff
appended to the end of every "content" result (I don't really know why?)
What is the best way to remove this unicode character from the json result? It is producing a '?'
in some of the output I am displaying.
Example:
https://developers.google.com/+/api/latest/activities/get#try-it
enter activity id
z12pvrsoaxqlw5imi22sdd35jwvkglj5204
and click Execute, result will be:
{
.....
"object": {
......
"content": "CONTENT OF GOOGLE PLUS POST HERE \ufeff",
......
example PHP code which shows a '?' where the '\ufeff' is:
<?php
$data = json_decode($result_from_google_plus_api, true);
echo $data['object']['content'];
// outputs "CONTENT OF GOOGLE PLUS POST HERE ?"
echo trim($data['object']['content']);
// outputs "CONTENT OF GOOGLE PLUS POST HERE ?"
Or am I going about this the wrong way? Should I be fixing the '?' issue rather than trying to remove the '\ufeff'
?
In your case, you could use this regexp:
$str = preg_replace('/\x{feff}$/u', '', $str);
That way you can exactly match that code point value and have it removed.
From my experience there are a lot more white-spacey-character you want to remove. From my experienced this works well for me:
# I like to call this unicodeTrim()
$str = preg_replace(
'/
^
[\pZ\p{Cc}\x{feff}]+
|
[\pZ\p{Cc}\x{feff}]+$
/ux',
'',
$str
);
I found http://www.regular-expressions.info/unicode.html a pretty good resource about the fine details:
\pZ
- match any kind of whitespace or invisible separator\p{Cc}
- match control characters\x{feff}
- match BOMI've seen regex suggest to match \pC
instead of \pCc
, however this is dangerous because pC
includes any code point to which no character has been assigned. I've had actual data (certain emojis or other stuff) being removed because of this.
But, YMMW, I cant' stress this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With