Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why did this str_ireplace() work on a non ASCII string?

Note: What I think I know is probably wrong, so please kindly fix my knowledge :)


I just answered a question about UTF-8 and PHP.

I suggested using str_ireplace('Волгоград', '', $a).

I didn't expect this to work, but it did.

I always thought PHP treated one byte as one character, hence why you need to use mb_* functions to get accurate results when using characters outside of ASCII range.

I assumed the Russian characters would take > 1 byte each.

I thought str_replace() would work because the bytes could be matched regardless of whether they are multibyte or not, as long as they are in order.

I thought str_ireplace() would not work because PHP wouldn't know how to map the non ASCII characters to their alternate case equivalent. But, it did work.


Where and how am I wrong? Give me as much information as you can :)

like image 722
alex Avatar asked Mar 28 '11 12:03

alex


1 Answers

It works by making the text lower case by passing it to the libc functions which are dependent on the locale settings; appropriate settings means that the text will lower case properly if the correct charset is used for the bytes.

like image 94
Ignacio Vazquez-Abrams Avatar answered Sep 27 '22 16:09

Ignacio Vazquez-Abrams