Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing the "\ufeff" from the end of object -> content in Google+ API json result

The result from the Google+ API has \ufeff appended to the end of every "content" result (I don't really know why?)

What is the best way to remove this unicode character from the json result? It is producing a '?' in some of the output I am displaying.

Example:

https://developers.google.com/+/api/latest/activities/get#try-it 

enter activity id

z12pvrsoaxqlw5imi22sdd35jwvkglj5204

and click Execute, result will be:

{
 .....
 "object": {
  ......
  "content": "CONTENT OF GOOGLE PLUS POST HERE \ufeff",
  ......

example PHP code which shows a '?' where the '\ufeff' is:

<?php
$data = json_decode($result_from_google_plus_api, true);
echo $data['object']['content'];
// outputs "CONTENT OF GOOGLE PLUS POST HERE ?"
echo trim($data['object']['content']);
// outputs "CONTENT OF GOOGLE PLUS POST HERE ?"

Or am I going about this the wrong way? Should I be fixing the '?' issue rather than trying to remove the '\ufeff'?

like image 600
dtbaker Avatar asked Dec 19 '22 15:12

dtbaker


1 Answers

In your case, you could use this regexp:

$str = preg_replace('/\x{feff}$/u', '', $str);

That way you can exactly match that code point value and have it removed.

From my experience there are a lot more white-spacey-character you want to remove. From my experienced this works well for me:

# I like to call this unicodeTrim()
$str = preg_replace(
  '/
    ^
    [\pZ\p{Cc}\x{feff}]+
    |
    [\pZ\p{Cc}\x{feff}]+$
   /ux',
  '',
  $str
);

I found http://www.regular-expressions.info/unicode.html a pretty good resource about the fine details:

  • \pZ - match any kind of whitespace or invisible separator
  • \p{Cc} - match control characters
  • \x{feff} - match BOM

I've seen regex suggest to match \pC instead of \pCc, however this is dangerous because pC includes any code point to which no character has been assigned. I've had actual data (certain emojis or other stuff) being removed because of this.

But, YMMW, I cant' stress this.

like image 190
mark Avatar answered Dec 24 '22 02:12

mark