I'm trying to remove repeating white-space characters from UTF8 string in PHP using regex. This regex
$txt = preg_replace( '/\s+/i' , ' ', $txt );
usually works fine, but some of the strings have Cyrillic letter "Р", which is screwed after the replacement. After small research I realized that the letter is encoded as \x{D0A0}, and since \xA0 is non-breaking white space in ASCII the regex replaces it with \x20 and the character is no longer valid.
Any ideas how to do this properly in PHP with regex?
Try the u
modifier:
$txt="UTF 字符串 with 空格符號";
var_dump(preg_replace("/\\s+/iu","",$txt));
Outputs:
string(28) "UTF字符串with空格符號"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With