Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does preg_replace() change my character set?

I have the following piece of code which seems to be changing my character set.

     $html = "à";
     echo $html;  // result: à
     $html = preg_replace("/\s/", "", $html);
     echo $html;  // result: ?

However, when I use [\t\n\r\f\v] as my pattern instead of the special character \s it works fine:

     $html = "à";
     echo $html;  // result: à
     $html = preg_replace("/[\t\n\r\f\v]/", "", $html);
     echo $html;  // result: à

Why is that?

like image 609
David Janssen Avatar asked Oct 28 '13 08:10

David Janssen


People also ask

What does Preg_replace do in PHP?

The preg_replace() function returns a string or array of strings where all matches of a pattern or list of patterns found in the input are replaced with substrings. There are three different ways to use this function: 1. One pattern and a replacement string.

What is the difference between Str_replace and Preg_replace?

str_replace replaces a specific occurrence of a string, for instance "foo" will only match and replace that: "foo". preg_replace will do regular expression matching, for instance "/f. {2}/" will match and replace "foo", but also "fey", "fir", "fox", "f12", etc.


1 Answers

I have the same problem. It is because of UTF8.

à is 0xc3a0 in UTF8. In PHP you can write like this: "\xc3\xa0".

With PCRE the /s match 0xa0 like it was ASCII "Non-breaking space".

You can use the u flag to resolve the problem.

$html = preg_replace("/\s/u", "", $html);
like image 143
Fabien Sa Avatar answered Nov 15 '22 08:11

Fabien Sa