Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does PHP's iconv need setlocale?

Tags:

php

locale

iconv

I'm currently trying to remove all special characters and accents from an UTF-8 string by turning them into their equivalent ASCII character if possible.

So I'm simply using this code:

$result = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $input);

The problem is that for example the word "début" turns into "dbut" instead of "debut". To make it work, I need to add a call to setlocale, like this:

setlocale(LC_ALL, 'en_US.UTF8');
$result = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $input);

And I don't understand why. I thought UTF-8 and ASCII were always the same, whatever locale you use.

EDIT: I didn't mean UTF-8 equals ASCII, I meant UTF-8 always equals UTF-8 and ASCII always equals ASCII

like image 420
Tomaka17 Avatar asked Nov 05 '22 12:11

Tomaka17


1 Answers

The subset of UTF-8 that overlaps with ASCII (which is code points 0-127) is indeed identical with ASCII. However, accented latin characters are not part of the ASCII character set and if you don't setlocale yourself, the system's default locale (which evidently does not contain these accented characters) is used to get a character set to work with.

In general, iconv can be a little iffy; this is mentioned in the introduction of the extension:

This module contains an interface to iconv character set conversion facility. With this module, you can turn a string represented by a local character set into the one represented by another character set, which may be the Unicode character set. Supported character sets depend on the iconv implementation of your system. Note that the iconv function on some systems may not work as you expect. In such case, it'd be a good idea to install the GNU libiconv library. It will most likely end up with more consistent results.

like image 137
Jon Avatar answered Nov 14 '22 23:11

Jon