I have a problem with UTF-8 and mb_strtoupper.
mb_internal_encoding('UTF-8');
$guesstitlestring='Le Courrier de Sáint-Hyácinthe';
$encoding=mb_detect_encoding($guesstitlestring);
if ($encoding!=='UTF-8') $guesstitlestring=mb_convert_encoding($guesstitlestring,'UTF-8',$encoding);
echo "DEBUG1 $guesstitlestring\n";
$guesstitlestring=mb_strtoupper($guesstitlestring);
echo "DEBUG2 $guesstitlestring\n";
Result:
DEBUG1 Le Courrier de Sáint-Hyácinthe
DEBUG2 LE COURRIER DE S?INT-HY?CINTHE
I don't understand why this is happening? I'm trying to be as careful as I can with the encoding. The string is given first as a UTF-8, verified and possible reconverted to UTF-8. It's a nightmare!
UPDATE
So I've figured out that this was caused by a combination of my entering the arguments via the console and the arguments coming back out of the console. So they were garbled both on the way in and the way out. The solution is to not enter any of the arguments in this way, or get the arguments out in this way.
Thank you everyone for your help in resolving this issue!
Instead of strtoupper()/mb_strtoupper()
use mb_convert_case()
since upper case converting is very tricky across different encodings, also make sure your string IS UTF-8.
$content = 'Le Courrier de Sáint-Hyácinthe';
mb_internal_encoding('UTF-8');
if(!mb_check_encoding($content, 'UTF-8')
OR !($content === mb_convert_encoding(mb_convert_encoding($content, 'UTF-32', 'UTF-8' ), 'UTF-8', 'UTF-32'))) {
$content = mb_convert_encoding($content, 'UTF-8');
}
// LE COURRIER DE SÁINT-HYÁCINTHE
echo mb_convert_case($content, MB_CASE_UPPER, "UTF-8");
Working example: http://3v4l.org/enEfm#v443
See also my comment at the PHP website about the converter: http://www.php.net/manual/function.utf8-encode.php#102382
It works for me, but only when the php file itself is saved as UTF-8 and when the terminal that I'm in expects UTF-8. I think what is happening for you is that the file is saved as ISO-8859-1 and your terminal is expecting ISO-8859-1.
First, mb_detect_encoding
doesn't actually work for this string. Even when the PHP file is not UTF-8, it still reports it as UTF-8.
When you print the lower case string, it prints ISO-8859-1 characters and your terminal displays them just fine. Then when you convert to upper case using UTF-8, it gets mangled.
I created two versions of this file. I saved it using my text editor in ISO-8859-1 as iso-8859-1.php
. Then I used iconv to convert the entire file to UTF-8 and saved it as utf-8.php
iconv iso-8859-1.php --from iso-8859-1 --to UTF-8 > utf-8.php
I added a line to print the result the encoding that mb_detect_encoding returns.
$ file iso-8859-1.php
iso-8859-1.php: PHP script, ISO-8859 text
$ php iso-8859-1.php
ENCODING: UTF-8
DEBUG1 Le Courrier de S�int-Hy�cinthe
DEBUG2 LE COURRIER DE S?INT-HY?CINTHE
$ file utf-8.php
utf-8.php: PHP script, UTF-8 Unicode text
$ php utf-8.php
ENCODING: UTF-8
DEBUG1 Le Courrier de Sáint-Hyácinthe
DEBUG2 LE COURRIER DE SÁINT-HYÁCINTHE
My terminal actually expects UTF-8 text, so when I print out ISO-8859-1 text it gets mangled. Everything works correctly when the file is saved as utf-8 and the terminal expects utf-8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With