Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove repeating white-space characters from UTF8 string in PHP properly with regex?

I'm trying to remove repeating white-space characters from UTF8 string in PHP using regex. This regex

    $txt = preg_replace( '/\s+/i' , ' ', $txt );

usually works fine, but some of the strings have Cyrillic letter "Р", which is screwed after the replacement. After small research I realized that the letter is encoded as \x{D0A0}, and since \xA0 is non-breaking white space in ASCII the regex replaces it with \x20 and the character is no longer valid.

Any ideas how to do this properly in PHP with regex?

like image 742
anandr Avatar asked Nov 19 '12 08:11

anandr


1 Answers

Try the u modifier:

$txt="UTF 字符串 with 空格符號";
var_dump(preg_replace("/\\s+/iu","",$txt));

Outputs:

string(28) "UTF字符串with空格符號"
like image 60
Passerby Avatar answered Nov 15 '22 09:11

Passerby