Is there any good solution out there that does this transliteration in a good manner?
I've tried using iconv()
, but is very annoying and it does not behave as one might expect.
//TRANSLIT
will try to replace what it can, leaving everything nonconvertible as "?" //IGNORE
will not leave "?" in text, but will also not transliterate and will also raise E_NOTICE
when nonconvertible char is found, so you have to use iconv with @ error suppressor//IGNORE//TRANSLIT
(as some people suggested in PHP forum) is actually same as //IGNORE
(tried it myself on php versions 5.3.2 and 5.3.13)//TRANSLIT//IGNORE
is same as //TRANSLIT
It also uses current locale settings to transliterate.
WARNING - a lot of text and code is following!
Here are some examples:
$text = 'Regular ascii text + čćžšđ + äöüß + éĕěėëȩ + æø€ + $ + ¶ + @';
echo '<br />original: ' . $text;
echo '<br />regular: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> regular: Regular ascii text + ????? + ???ss + ?????? + ae?EUR + $ + ? + @
setlocale(LC_ALL, 'en_GB');
echo '<br />en_GB: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> en_GB: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @
setlocale(LC_ALL, 'en_GB.UTF8'); // will this work?
echo '<br />en_GB.UTF8: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> en_GB.UTF8: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @
Ok, that did convert č ć š ä ö ü ß é ĕ ě ė ë ȩ and æ, but why not đ and ø?
// now specific locales
setlocale(LC_ALL, 'hr_Hr'); // this should fix croatian đ, right?
echo '<br />hr_Hr: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
// wrong > hr_Hr: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @
setlocale(LC_ALL, 'sv_SE'); // so this will fix swedish ø?
echo '<br />sv_SE: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
// will not > sv_SE: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @
//this is interesting
setlocale(LC_ALL, 'de_DE');
echo '<br />de_DE: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> de_DE: Regular ascii text + cczs? + aeoeuess + eeeeee + ae?EUR + $ + ? + @
// actually this is what any german would expect since ä ö ü really is same as ae oe ue
Lets try with //IGNORE
:
echo '<br />ignore: ' . iconv("UTF-8", "ASCII//IGNORE", $text);
//> ignore: Regular ascii text + + + + + $ + + @
//+ E_NOTICE: "Notice: iconv(): Detected an illegal character in input string in /var/www/test.server.web/index.php on line 49"
// with translit?
echo '<br />ignore/translit: ' . iconv("UTF-8", "ASCII//IGNORE//TRANSLIT", $text);
//same as ignore only> ignore/translit: Regular ascii text + + + + + $ + + @
//+ E_NOTICE: "Notice: iconv(): Detected an illegal character in input string in /var/www/test.server.web/index.php on line 54"
// translit/ignore?
echo '<br />translit/ignore: ' . iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", $text);
//same as translit only> translit/ignore: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @
Using solution of this guy also does not work as wanted: Regular ascii text + YYYYY + aous + eYYYeY + aoY + $ + � + @
Even using PECL intl Normalizer class (which is not awailable always even if you have PHP > 5.3.0, since ICU package intl uses may not be available to PHP i.e. on certain hosting servers) produces wrong result:
echo '<br />normalize: ' .preg_replace('/\p{Mn}/u', '', Normalizer::normalize($text, Normalizer::FORM_KD));
//>normalize: Regular ascii text + cczsđ + aouß + eeeeee + æø€ + $ + ¶ + @
So is there any other way of doing this right or the only proper thing to do is to do preg_replace()
or str_replace()
and define transliteration tables yourself?
// appendix: I have found on ZF wiki debate from 2008 about proposal for Zend_Filter_Transliterate but project was dropped since in some languages it is not possible to convert (i.e. chinese), but still for any latin- and cyrilic-based language IMO this option should exist.
The toAscii() function of Patchwork\Utf8 does exactly this, see:
https://github.com/nicolas-grekas/Patchwork-UTF8/blob/master/src/Patchwork/Utf8.php
It leverages iconv and intl's Normalizer to remove accents, split ligatures and do many other generic transliterations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With