Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to strip unicode chars (LEFT_TO_RIGHT_MARK) from a string in php

Tags:

regex

php

utf-8

I'm trying to remove LEFT-TO-RIGHT-MARK (\u200e) and RIGHT-TO-LEFT-MARK (\u200f) from a string before encoding it as JSON. Neither of the following seems to work:

$s = mb_ereg_replace("\u200e", '', $s);
$s = preg_replace("#\u200e#u", '', $s);
$s = preg_replace("#\u200e#", '', $s);

Any help is appreciated!

like image 714
Marc Avatar asked Dec 18 '09 18:12

Marc


2 Answers

After wrestling with this issue for a couple of days, I finally have found the answer!

$str = preg_replace('/(\x{200e}|\x{200f})/u', '', $str);
like image 111
Tim Groeneveld Avatar answered Nov 20 '22 06:11

Tim Groeneveld


Your Unicode escaping is wrong, this should work:

preg_replace('/\x20(\x0e|\x0f)/', '', $string)

Test:

<?php
  $string = chr(0x20) . chr(0x0e) . 'fo' . chr(0x20) . chr(0x0e) . 'o' . chr(0x20) . chr(0x0f);
  echo $string . "\n";
  echo preg_replace('/\x20(\x0e|\x0f)/', '', $string);
?>

Or, use str_replace():

  str_replace(array("\x20\x0e", "\x20\x0f"), '', $string);
like image 34
tmont Avatar answered Nov 20 '22 04:11

tmont