How to remove repeating white-space characters from UTF8 string in PHP properly with regex?

Question

I'm trying to remove repeating white-space characters from UTF8 string in PHP using regex. This regex

    $txt = preg_replace( '/\s+/i' , ' ', $txt );

usually works fine, but some of the strings have Cyrillic letter "Р", which is screwed after the replacement. After small research I realized that the letter is encoded as \x{D0A0}, and since \xA0 is non-breaking white space in ASCII the regex replaces it with \x20 and the character is no longer valid.

Any ideas how to do this properly in PHP with regex?

Passerby · Accepted Answer

Try the u modifier:

$txt="UTF 字符串 with 空格符號";
var_dump(preg_replace("/\s+/iu","",$txt));

Outputs:

string(28) "UTF字符串with空格符號"

How to remove repeating white-space characters from UTF8 string in PHP properly with regex?

Tags:

regex

php

whitespace

utf-8

anandr

1 Answers

Passerby

Recent Activity

Donate For Us

How to remove repeating white-space characters from UTF8 string in PHP properly with regex?

Tags:

regex

php

whitespace

utf-8

anandr

1 Answers

Passerby

Related questions

Recent Activity

Donate For Us