Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining and removing invisible characters from a string in PHP (%E2%80%8E)

I have strings in PHP which I read from a database. The strings are URLs and at first glance they look good, but there seems to be some weird character at the end. In the address bar of the browser, the string '%E2%80%8E' gets appended to the URL, which breaks the URL.

I found this post on stripping the left-to-right-mark from a string in PHP and it seems related to my problem, but the solution does not work for me because my characters seem to be something else.

So how can I determine which character I have so I can remove it from the strings?

(I would post one of the URLs here as an example, but the stack overflow form strips the character at the end as soon as I paste it in here.)

I know that I could only allow certain chars in the string and discard all others. But I would still like to know what char it is -- and how it gets into the database.

EDIT: The question has been answered and the code given in the accepted answer works for me:

$str = preg_replace('/\p{C}+/u', "", $str);
like image 408
spirit Avatar asked Apr 17 '14 10:04

spirit


1 Answers

If the input is utf8-encoded, might use unicode regex to match/strip invisible control characters like e2808e (left-to-right-mark). Use u (PCRE_UTF8) modifier and \p{C} or \p{Other}.

Strip out all invisibles:

$str = preg_replace('/\p{C}+/u', "", $str);

Here is a list of \p{Other}


Detect/identify invisibles:

$str = ".\xE2\x80\x8E.\xE2\x80\x8B.\xE2\x80\x8F";

// get invisibles + offset
if(preg_match_all('/\p{C}/u', $str, $out, PREG_OFFSET_CAPTURE))
{
  echo "<pre>\n";
  foreach($out[0] AS $k => $v) {
    echo "detected ".bin2hex($v[0])." @ offset ".$v[1]."\n";
  }
  echo "</pre>";
}

outputs:

detected e2808e @ offset 1
detected e2808b @ offset 5
detected e2808f @ offset 9

Test on eval.in

To identify, look up at Google e.g. fileformat.info:

@google: site:fileformat.info e2808e

like image 77
Jonny 5 Avatar answered Sep 28 '22 10:09

Jonny 5