Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP multibyte safe preg_replace Vs. str_replace

Good day!

I am having some troubles with preg_replace and utf-8 characters. The following code-fragment:

$v = "line1\nline2\r\nмы хотели бы поблагодарить";
print $v;
print preg_replace("#\R#", "", $v);
print preg_replace("\n", "", $v);

returns the following output:

line1
line2
мы хотели бы поблагодарить

line1line2мы �отели бы поблагодарить

line1line2
мы хотели бы поблагодарить Вас

For some reason the х is unreadable when \R is used but it is unaffected when \n is used. As \R is PHP specific I suppose this generates the problem. Does anybody have a clue about how I could use \R (which is not accepted by str_replace) in preg_replace? I fear this problem might be happening in many other cases, not only with capital chi.

like image 671
Fabbio Avatar asked Apr 17 '26 18:04

Fabbio


1 Answers

Since you have a Unicode input, you must pass /u flag to the regex to deal with the input correctly:

$v = "line1\nline2\r\nмы хотели бы поблагодарить";
echo preg_replace('/\R/u', "", $v);
// => line1line2мы хотели бы поблагодарить

See IDEONE demo

This /u flag is required when both pattern and input can contain Unicode string literals.

like image 166
Wiktor Stribiżew Avatar answered Apr 20 '26 08:04

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!