Why Normalizer::normalize (PHP) doesn't work?

Tags:

I'm trying to normalize strings with characters like 'áéíóú' to 'aeiou' to simplify searches.

Following the response to this question I should use the Normalizer class to do it.

The problem is that the normalize function does nothing. For example, that code:

<?php echo 'Pérez, NFC: ' . normalizer_normalize('Pérez', Normalizer::NFC) 
    . ' NFD: ' .normalizer_normalize('Pérez', Normalizer::NFD)
    . ' NFKC: ' .normalizer_normalize('Pérez', Normalizer::NFKC) 
    . ' NFKD: ' .normalizer_normalize('Pérez', Normalizer::NFKD)?>
<br/>
<?php echo 'aáàä, êëéè,' 
    . ' FORM_C: ' . normalizer_normalize('aáàä, êëéè', Normalizer::FORM_C )
    . ' FORM_D: ' .normalizer_normalize('aáàä, êëéè', Normalizer::FORM_D)
    . ' FORM_KC: ' .normalizer_normalize('aáàä, êëéè', Normalizer::FORM_KC)
    . ' FORM_KD: ' .normalizer_normalize('aáàä, êëéè', Normalizer::FORM_KD)?>

shows:

Pérez, NFC: Pérez NFD: Pérez NFKC: Pérez NFKD: Pérez
aáàä, êëéè, FORM_C: aáàä, êëéè FORM_D: aáàä, êëéè FORM_KC: aáàä, êëéè FORM_KD: aáàä, êëéè

What is supposed normalize must do?

---EDITED---

It is stranger. When copy and paste the result from web browser, while in editor and original page I can see:

FORM_D: aáàä, êëéè

in the stackoverflow question page I can see (just in Code Sample mode):

FORM_D: aáàä, êëéè

430

asked Aug 30 '13 07:08

4 Answers

Normalizer with FORM_D can split the diacritics out from the base characters, then preg_replace can eliminate the diacritics:

$string = 'áéíóú';
echo preg_replace('/[\x{0300}-\x{036f}]/u', "", Normalizer::normalize($string , Normalizer::FORM_D));
//aeiou

answered Sep 23 '22 21:09

OXiGEN

Found on this page: (the linked document has different wording, the old one never exists anymore)

Unicode and internationalization is a large topic, but you should know at least one more important thing. For historical reasons, Unicode allows alternative representations of some characters. For example, á can be written either as one precomposed character á with the Unicode code point U+00E1 or as a decomposed sequence of the letter a (U+0061) combined with the accent ´ (U+0301). For purposes of comparison and sorting, two such representations should be taken as equal. To solve this, the intl library provides the Normalizer class. This class in turn provides the normalize() method, which you can use to convert a string to a normalized composed or decomposed form. Your application should consistently transform all strings to one or the other form before performing comparisons.

echo Normalizer::normalize("a´", Normalizer::FORM_C); // á  
echo Normalizer::normalize("á", Normalizer::FORM_D); // a´

So eliminating accents (and similar) is not the purpose of Normalizer.

answered Sep 19 '22 21:09

francadaval

What you are looking for is iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text).

http://php.net/manual/function.iconv.php

Be careful with LC_* settings! Depending on the setting the transliteration might change.

answered Sep 20 '22 21:09

For a function that actually removes the accents, the best that I have found so far is in the wordpress core: https://core.trac.wordpress.org/browser/trunk/src/wp-includes/formatting.php#L1127 remove_accents($string)

(Note I have filed a bug against it in order for them to take an updated version that I provided which documents each character and how it is tranlsted. so it may change in the future)

answered Sep 20 '22 21:09

John Schlick

Related questions
                            
                                Cutting a string with accents [duplicate]
                            
                                Multiple resizing in CodeIgniter
                            
                                Passing a Variable in strtotime() function in PHP
                            
                                WordPress: Get attached image height and width
                            
                                Trying to write a file to a different directory using fopen()
                            
                                Can we do multiple explode statements on one line in PHP?
                            
                                Escape only single quotes (leave double quotes alone) with htmlspecialchars()
                            
                                Insert into a table which has a dash in the name
                            
                                PHP: Set private, protected, public in interface?
                            
                                php : How to create a string of image binary without saving it to a file?
                            
                                How to set custom error message zend form element file?
                            
                                Can a user manipulate cookies?
                            
                                developing Laravel 3 and laravel 4
                            
                                How to json_encode without escaping slashes?
                            
                                join query in CodeIgniter [duplicate]
                            
                                Extract Protected Request Response from AWS SDK for PHP
                            
                                Associative array move last element to first [closed]
                            
                                PHP array and implode with blank/null values
                            
                                Using postgresql with php under windows/xampp
                            
                                Is there a function for knowing if user is 'shop_manager' in WP / woocommerce

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why Normalizer::normalize (PHP) doesn't work?

Tags:

php

normalization

intl

francadaval

People also ask

4 Answers

OXiGEN

francadaval

Fabian Blechschmidt

John Schlick

Recent Activity

Donate For Us