Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Natural sorting algorithm in PHP with support for Unicode?

Tags:

Is it possible to sort an array with Unicode / UTF-8 characters in PHP using a natural order algorithm? For example (the order in this array is correctly ordered):

$array = array
(
    0 => 'Agile',
    1 => 'Ágile',
    2 => 'Àgile',
    3 => 'Âgile',
    4 => 'Ägile',
    5 => 'Ãgile',
    6 => 'Test',
);

If I try with asort($array) I get the following result:

Array
(
    [0] => Agile
    [6] => Test
    [2] => Àgile
    [1] => Ágile
    [3] => Âgile
    [5] => Ãgile
    [4] => Ägile
)

And using natsort($array):

Array
(
    [2] => Àgile
    [1] => Ágile
    [3] => Âgile
    [5] => Ãgile
    [4] => Ägile
    [0] => Agile
    [6] => Test
)

How can I implement a function that returns the correct result order (0, 1, 2, 3, 4, 5, 6) under PHP 5? All the multi byte string functions (mbstring, iconv, ...) are available on my system.

EDIT: I want to natsort() the values, not the keys - the only reason why I'm explicitly defining the keys (and using asort() instead of sort()) is to ease the job of finding out where the sorting of unicode values went wrong.

like image 712
Alix Axel Avatar asked May 07 '09 03:05

Alix Axel


People also ask

What sorting algorithm does PHP use?

For sorting, PHP uses an implementation of quicksort that can be found in Zend/zend_sort. c , which takes a comparison function and an array of elements. The default comparison function for sort() is defined in ext/standard/array.

What is the use of sort () function in PHP?

The sort() function is an inbuilt function in PHP and is used to sort an array in ascending order i.e, smaller to greater. It sorts the actual array and hence changes are reflected in the original array itself. The function provides us with 6 sorting types, according to which the array can be sorted.

How do I sort a key in PHP?

The ksort() function sorts an associative array in ascending order, according to the key. Tip: Use the krsort() function to sort an associative array in descending order, according to the key. Tip: Use the asort() function to sort an associative array in ascending order, according to the value.

How do you sort an object in PHP?

Approach: The usort() function is an inbuilt function in PHP which is used to sort the array of elements conditionally with a given comparator function. The usort() function can also be used to sort an array of objects by object field.


2 Answers

The question is not as easy to answer as it seems on the first look. This is one of the areas where PHP's lack of unicode supports hits you with full strength.

Frist of all natsort() as suggested by other posters has nothing to do with sorting arrays of the type you want to sort. What you're looking for is a locale aware sorting mechanism as sorting strings with extended characters is always a question of the used language. Let's take German for example: A and Ä can sometimes be sorted as if they were the same letter (DIN 5007/1), and sometimes Ä can be sorted as it was in fact "AE" (DIN 5007/2). In Swedish, in contrast, Ä comes at the end of the alphabet.

If you don't use Windows, you're lucky as PHP provides some functions to exactly this. Using a combination of setlocale(), usort(), strcoll() and the correct UTF-8 locale for your language, you get something like this:

$array = array('Àgile', 'Ágile', 'Âgile', 'Ãgile', 'Ägile', 'Agile', 'Test');
$oldLocal = setlocale(LC_COLLATE, '<<your_RFC1766_language_code>>.utf8');
usort($array, 'strcoll');
setlocale(LC_COLLATE, $oldLocal);

Please note that it's mandatory to use the UTF-8 locale variant in order to sort UTF-8 strings. I reset the locale in the example above to its original value as setting a locale using setlocale() can introduce side-effects in other running PHP script - please see PHP manual for more details.

When you do use a Windows machine, there is currently no solution to this problem and there won't be any before PHP 6 I assume. Please see my own question on SO targeting this specific problem.

like image 152
Stefan Gehrig Avatar answered Oct 07 '22 14:10

Stefan Gehrig


Nailed it!

$array = array('Ägile', 'Ãgile', 'Test', 'カタカナ', 'かたかな', 'Ágile', 'Àgile', 'Âgile', 'Agile');

function Sortify($string)
{
    return preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|tilde|uml);~i', '$1' . chr(255) . '$2', htmlentities($string, ENT_QUOTES, 'UTF-8'));
}

array_multisort(array_map('Sortify', $array), $array);

Output:

Array
(
    [0] => Agile
    [1] => Ágile
    [2] => Âgile
    [3] => Àgile
    [4] => Ãgile
    [5] => Ägile
    [6] => Test
    [7] => かたかな
    [8] => カタカナ
)

Even better:

if (extension_loaded('intl') === true)
{
    collator_asort(collator_create('root'), $array);
}

Thanks to @tchrist!

like image 25
Alix Axel Avatar answered Oct 07 '22 16:10

Alix Axel