Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is mb_convert_case in PHP 5.4 breaking my string, when in 5.2 it doesn't?

Tags:

php

unicode

I have the following code:

header('Content-type: text/html; charset=utf-8');
$str = 'áá áá';
echo $str."\n";
echo mb_convert_case($str, MB_CASE_TITLE)."\n";
echo bin2hex($str)."\n";
echo bin2hex(mb_convert_case($str, MB_CASE_TITLE))."\n";

Using PHP 5.2.2, I get the following output:

áá áá
áá áá
c3a1c3a120c3a1c3a1
c3a1c3a120c3a1c3a1

Using PHP 5.4.3, I get this:

áá áá
á� á�
c3a1c3a120c3a1c3a1
c3a1e3a120c3a1e3a1

My expected output in both cases would have been:

áá áá
Áá Áá
c3a1c3a120c3a1c3a1
c381c3a120c381c3a1

So I have two questions:

  1. Why isn't the á being converted to Á?
  2. Why is PHP 5.4 breaking my strings?
like image 921
Alex Avatar asked Oct 05 '12 12:10

Alex


1 Answers

Either pass in $encoding to every call to mb_ functions, or set:

mb_internal_encoding("UTF-8");

to make sure PHP knows what encoding you're working with. Otherwise the encoding comes from php.ini, or a default ISO-8859-1 if not included there either.

So your 5.4 installation is defaulting to ISO-8859-1 and so lowercasing the lead byte of the UTF-8 sequence, breaking it. The same happens for me in 5.2, so maybe there's something else about your 5.2 installation that's different - maybe internal_encoding in the ini being set to something else without letters in those byte positions?

like image 95
bobince Avatar answered Nov 13 '22 02:11

bobince