Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I remove accents from characters in a PHP string?

Tags:

php

iconv

I'm attempting to remove accents from characters in PHP string as the first step to making the string usable in a URL.

I'm using the following code:

$input = "Fóø Bår";  setlocale(LC_ALL, "en_US.utf8"); $output = iconv("utf-8", "ascii//TRANSLIT", $input);  print($output); 

The output I would expect would be something like this:

F'oo Bar 

However, instead of the accented characters being transliterated they are replaced with question marks:

F?? B?r 

Everything I can find online indicates that setting the locale will fix this problem, however I'm already doing this. I've already checked the following details:

  1. The locale I am setting is supported by the server (included in the list produced by locale -a)
  2. The source and target encodings (UTF-8 and ASCII) are supported by the server's version of iconv (included in the list produced by iconv -l)
  3. The input string is UTF-8 encoded (verified using PHP's mb_check_encoding function, as suggested in the answer by mercator)
  4. The call to setlocale is successful (it returns 'en_US.utf8' rather than FALSE)

The cause of the problem:

The server is using the wrong implementation of iconv. It has the glibc version instead of the required libiconv version.

Note that the iconv function on some systems may not work as you expect. In such case, it'd be a good idea to install the GNU libiconv library. It will most likely end up with more consistent results.
– PHP manual's introduction to iconv

Details about the iconv implementation that is used by PHP are included in the output of the phpinfo function.

(I'm not able to re-compile PHP with the correct iconv library on the server I'm working with for this project so the answer I've accepted below is the one that was most useful for removing accents without iconv support.)

like image 513
georgebrock Avatar asked Jun 19 '09 12:06

georgebrock


People also ask

How can I replace an accented character with a normal character in PHP?

php $transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD); $test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto']; foreach($test as $e) { $normalized = $transliterator->transliterate($e); echo $e.

How can I remove multiple special characters from a string in PHP?

function clean($string) { $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens. return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars. }

How do you change an accented character to a regular character?

replace(/[^a-z0-9]/gi,'') . However a more intuitive solution (at least for the user) would be to replace accented characters with their "plain" equivalent, e.g. turn á , á into a , and ç into c , etc.


1 Answers

What about the WordPress implementation?

function remove_accents($string) {     if ( !preg_match('/[\x80-\xff]/', $string) )         return $string;      $chars = array(     // Decompositions for Latin-1 Supplement     chr(195).chr(128) => 'A', chr(195).chr(129) => 'A',     chr(195).chr(130) => 'A', chr(195).chr(131) => 'A',     chr(195).chr(132) => 'A', chr(195).chr(133) => 'A',     chr(195).chr(135) => 'C', chr(195).chr(136) => 'E',     chr(195).chr(137) => 'E', chr(195).chr(138) => 'E',     chr(195).chr(139) => 'E', chr(195).chr(140) => 'I',     chr(195).chr(141) => 'I', chr(195).chr(142) => 'I',     chr(195).chr(143) => 'I', chr(195).chr(145) => 'N',     chr(195).chr(146) => 'O', chr(195).chr(147) => 'O',     chr(195).chr(148) => 'O', chr(195).chr(149) => 'O',     chr(195).chr(150) => 'O', chr(195).chr(153) => 'U',     chr(195).chr(154) => 'U', chr(195).chr(155) => 'U',     chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y',     chr(195).chr(159) => 's', chr(195).chr(160) => 'a',     chr(195).chr(161) => 'a', chr(195).chr(162) => 'a',     chr(195).chr(163) => 'a', chr(195).chr(164) => 'a',     chr(195).chr(165) => 'a', chr(195).chr(167) => 'c',     chr(195).chr(168) => 'e', chr(195).chr(169) => 'e',     chr(195).chr(170) => 'e', chr(195).chr(171) => 'e',     chr(195).chr(172) => 'i', chr(195).chr(173) => 'i',     chr(195).chr(174) => 'i', chr(195).chr(175) => 'i',     chr(195).chr(177) => 'n', chr(195).chr(178) => 'o',     chr(195).chr(179) => 'o', chr(195).chr(180) => 'o',     chr(195).chr(181) => 'o', chr(195).chr(182) => 'o',     chr(195).chr(182) => 'o', chr(195).chr(185) => 'u',     chr(195).chr(186) => 'u', chr(195).chr(187) => 'u',     chr(195).chr(188) => 'u', chr(195).chr(189) => 'y',     chr(195).chr(191) => 'y',     // Decompositions for Latin Extended-A     chr(196).chr(128) => 'A', chr(196).chr(129) => 'a',     chr(196).chr(130) => 'A', chr(196).chr(131) => 'a',     chr(196).chr(132) => 'A', chr(196).chr(133) => 'a',     chr(196).chr(134) => 'C', chr(196).chr(135) => 'c',     chr(196).chr(136) => 'C', chr(196).chr(137) => 'c',     chr(196).chr(138) => 'C', chr(196).chr(139) => 'c',     chr(196).chr(140) => 'C', chr(196).chr(141) => 'c',     chr(196).chr(142) => 'D', chr(196).chr(143) => 'd',     chr(196).chr(144) => 'D', chr(196).chr(145) => 'd',     chr(196).chr(146) => 'E', chr(196).chr(147) => 'e',     chr(196).chr(148) => 'E', chr(196).chr(149) => 'e',     chr(196).chr(150) => 'E', chr(196).chr(151) => 'e',     chr(196).chr(152) => 'E', chr(196).chr(153) => 'e',     chr(196).chr(154) => 'E', chr(196).chr(155) => 'e',     chr(196).chr(156) => 'G', chr(196).chr(157) => 'g',     chr(196).chr(158) => 'G', chr(196).chr(159) => 'g',     chr(196).chr(160) => 'G', chr(196).chr(161) => 'g',     chr(196).chr(162) => 'G', chr(196).chr(163) => 'g',     chr(196).chr(164) => 'H', chr(196).chr(165) => 'h',     chr(196).chr(166) => 'H', chr(196).chr(167) => 'h',     chr(196).chr(168) => 'I', chr(196).chr(169) => 'i',     chr(196).chr(170) => 'I', chr(196).chr(171) => 'i',     chr(196).chr(172) => 'I', chr(196).chr(173) => 'i',     chr(196).chr(174) => 'I', chr(196).chr(175) => 'i',     chr(196).chr(176) => 'I', chr(196).chr(177) => 'i',     chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij',     chr(196).chr(180) => 'J', chr(196).chr(181) => 'j',     chr(196).chr(182) => 'K', chr(196).chr(183) => 'k',     chr(196).chr(184) => 'k', chr(196).chr(185) => 'L',     chr(196).chr(186) => 'l', chr(196).chr(187) => 'L',     chr(196).chr(188) => 'l', chr(196).chr(189) => 'L',     chr(196).chr(190) => 'l', chr(196).chr(191) => 'L',     chr(197).chr(128) => 'l', chr(197).chr(129) => 'L',     chr(197).chr(130) => 'l', chr(197).chr(131) => 'N',     chr(197).chr(132) => 'n', chr(197).chr(133) => 'N',     chr(197).chr(134) => 'n', chr(197).chr(135) => 'N',     chr(197).chr(136) => 'n', chr(197).chr(137) => 'N',     chr(197).chr(138) => 'n', chr(197).chr(139) => 'N',     chr(197).chr(140) => 'O', chr(197).chr(141) => 'o',     chr(197).chr(142) => 'O', chr(197).chr(143) => 'o',     chr(197).chr(144) => 'O', chr(197).chr(145) => 'o',     chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe',     chr(197).chr(148) => 'R',chr(197).chr(149) => 'r',     chr(197).chr(150) => 'R',chr(197).chr(151) => 'r',     chr(197).chr(152) => 'R',chr(197).chr(153) => 'r',     chr(197).chr(154) => 'S',chr(197).chr(155) => 's',     chr(197).chr(156) => 'S',chr(197).chr(157) => 's',     chr(197).chr(158) => 'S',chr(197).chr(159) => 's',     chr(197).chr(160) => 'S', chr(197).chr(161) => 's',     chr(197).chr(162) => 'T', chr(197).chr(163) => 't',     chr(197).chr(164) => 'T', chr(197).chr(165) => 't',     chr(197).chr(166) => 'T', chr(197).chr(167) => 't',     chr(197).chr(168) => 'U', chr(197).chr(169) => 'u',     chr(197).chr(170) => 'U', chr(197).chr(171) => 'u',     chr(197).chr(172) => 'U', chr(197).chr(173) => 'u',     chr(197).chr(174) => 'U', chr(197).chr(175) => 'u',     chr(197).chr(176) => 'U', chr(197).chr(177) => 'u',     chr(197).chr(178) => 'U', chr(197).chr(179) => 'u',     chr(197).chr(180) => 'W', chr(197).chr(181) => 'w',     chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y',     chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z',     chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z',     chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z',     chr(197).chr(190) => 'z', chr(197).chr(191) => 's'     );      $string = strtr($string, $chars);      return $string; } 

To understand what this function does, check the conversion table:

À => A Á => A  => A à => A Ä => A Å => A Ç => C È => E É => E Ê => E Ë => E Ì => I Í => I Î => I Ï => I Ñ => N Ò => O Ó => O Ô => O Õ => O Ö => O Ù => U Ú => U Û => U Ü => U Ý => Y ß => s à => a á => a â => a ã => a ä => a å => a ç => c è => e é => e ê => e ë => e ì => i í => i î => i ï => i ñ => n ò => o ó => o ô => o õ => o ö => o ù => u ú => u û => u ü => u ý => y ÿ => y Ā => A ā => a Ă => A ă => a Ą => A ą => a Ć => C ć => c Ĉ => C ĉ => c Ċ => C ċ => c Č => C č => c Ď => D ď => d Đ => D đ => d Ē => E ē => e Ĕ => E ĕ => e Ė => E ė => e Ę => E ę => e Ě => E ě => e Ĝ => G ĝ => g Ğ => G ğ => g Ġ => G ġ => g Ģ => G ģ => g Ĥ => H ĥ => h Ħ => H ħ => h Ĩ => I ĩ => i Ī => I ī => i Ĭ => I ĭ => i Į => I į => i İ => I ı => i IJ => IJ ij => ij Ĵ => J ĵ => j Ķ => K ķ => k ĸ => k Ĺ => L ĺ => l Ļ => L ļ => l Ľ => L ľ => l Ŀ => L ŀ => l Ł => L ł => l Ń => N ń => n Ņ => N ņ => n Ň => N ň => n ʼn => N Ŋ => n ŋ => N Ō => O ō => o Ŏ => O ŏ => o Ő => O ő => o Œ => OE œ => oe Ŕ => R ŕ => r Ŗ => R ŗ => r Ř => R ř => r Ś => S ś => s Ŝ => S ŝ => s Ş => S ş => s Š => S š => s Ţ => T ţ => t Ť => T ť => t Ŧ => T ŧ => t Ũ => U ũ => u Ū => U ū => u Ŭ => U ŭ => u Ů => U ů => u Ű => U ű => u Ų => U ų => u Ŵ => W ŵ => w Ŷ => Y ŷ => y Ÿ => Y Ź => Z ź => z Ż => Z ż => z Ž => Z ž => z ſ => s 

You can generate the conversion table yourself by simply iterating over the $chars array of the function:

foreach($chars as $k=>$v) {    printf("%s -> %s", $k, $v); } 
like image 183
dynamic Avatar answered Oct 10 '22 06:10

dynamic