I have the following piece of code which seems to be changing my character set.
$html = "à";
echo $html; // result: à
$html = preg_replace("/\s/", "", $html);
echo $html; // result: ?
However, when I use [\t\n\r\f\v]
as my pattern instead of the special character \s
it works fine:
$html = "à";
echo $html; // result: à
$html = preg_replace("/[\t\n\r\f\v]/", "", $html);
echo $html; // result: à
Why is that?
The preg_replace() function returns a string or array of strings where all matches of a pattern or list of patterns found in the input are replaced with substrings. There are three different ways to use this function: 1. One pattern and a replacement string.
str_replace replaces a specific occurrence of a string, for instance "foo" will only match and replace that: "foo". preg_replace will do regular expression matching, for instance "/f. {2}/" will match and replace "foo", but also "fey", "fir", "fox", "f12", etc.
I have the same problem. It is because of UTF8.
à
is 0xc3a0
in UTF8. In PHP you can write like this: "\xc3\xa0"
.
With PCRE the /s
match 0xa0
like it was ASCII "Non-breaking space".
You can use the u
flag to resolve the problem.
$html = preg_replace("/\s/u", "", $html);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With