Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP array sorting and compatibility with Persian alphabets

I'm trying to sort an array first by it's values and then by it's keys but php is not doing well with Persian characters.
Persian alphabets are similar to Arabic alphabets except some additional characters like 'گ چ پ ژ ک' and PHP is doing great at sorting Arabic letters in Persian Alphabets but the rest is not in their order.

For example

$str = 'ا ب پ ت ث ج چ ح خ د ذ ر ز ژ ص ض ط ظ ع غ ف ق ک گ ل م ن و ه ی';
$arr = explode(' ', $str);

will create an array ($arr) containing all Persian alphabets in correct alphabetical order. and if I shuffle it and use asort function like following:

shuffle($arr);
asort($arr);
var_dump($arr);

it will end as something like this:

    array
        2 => string 'ا'
        1 => string 'ب'
        22 => string 'ت'
        29 => string 'ث'
        20 => string 'ج'
        12 => string 'ح'
        21 => string 'خ'
        18 => string 'د'
        6 => string 'ذ'
        3 => string 'ر'
        27 => string 'ز'
        17 => string 'ص'
        11 => string 'ض'
        25 => string 'ط'
        5 => string 'ظ'
        16 => string 'ع'
        8 => string 'غ'
        26 => string 'ف'
        14 => string 'ق'
        9 => string 'ل'
        0 => string 'م'
        7 => string 'ن'
        10 => string 'ه'
        28 => string 'و'
        24 => string 'پ'
        23 => string 'چ'
        13 => string 'ژ'
        19 => string 'ک'
        4 => string 'گ'
        15 => string 'ی'

which is wrong!

24th item should be after 1st, 23rd should be after 20 and so on.

How can I write a functions doing something similar to PHP's own sorting functions? Or maybe there's a way to make PHP functions work for persian characters?

like image 843
Farid Rn Avatar asked Apr 01 '14 20:04

Farid Rn


People also ask

How do I sort an array alphabetically in PHP?

PHP - Sort Functions For Arrayssort() - sort arrays in ascending order. rsort() - sort arrays in descending order. asort() - sort associative arrays in ascending order, according to the value. ksort() - sort associative arrays in ascending order, according to the key.

How do you sort an array of arrays in PHP?

To PHP sort array by key, you should use ksort() (for ascending order) or krsort() (for descending order). To PHP sort array by value, you will need functions asort() and arsort() (for ascending and descending orders).

Are PHP arrays sorted?

From the php manual: Arrays are ordered. The order can be changed using various sorting functions.

What is sort ASC in PHP?

Definition and Usage. The sort() function sorts an indexed array in ascending order. Tip: Use the rsort() function to sort an indexed array in descending order.


2 Answers

I’ve written the following function to return the UTF-8 code point for any given character:

function utf8_ord($str) {
    $str = (string) $str;
    $ord = ord($str);
    $ord_b = decbin($ord);

    if (strlen($ord_b) <= 7) 
      return $ord;
    $len = strlen(strstr($ord_b, "0", true));

    if ($len < 2 || $len > 4 || strlen($str) < $len) 
      return false;
    $val = substr($ord_b, $len + 1);

    for ($i = 1; $i < $len; $i++) {
        $ord_b = decbin(ord($str[$i]));
        if ($ord_b[0].$ord_b[1] != "10") 
          return false;
        $val. = substr($ord_b, 2);
    }
    $val = bindec($val);
    return (($val > 0x10FFFF) ? null : $val);
}

Now let’s find out the UTF-8 code points of the characters in your array:

$str = 'ا ب پ ت ث ج چ ح خ د ذ ر ز ژ ص ض ط ظ ع غ ف ق ک گ ل م ن و ه ی';
$arr = explode(' ', $str);
print_r(array_map("utf8_ord", $arr));

The output will be:

Array
(
    [0] => 1575
    [1] => 1576
    [2] => 1662
    [3] => 1578
    [4] => 1579
    [5] => 1580
    [6] => 1670
    [7] => 1581
    [8] => 1582
    [9] => 1583
    [10] => 1584
    [11] => 1585
    [12] => 1586
    [13] => 1688
    [14] => 1589
    [15] => 1590
    [16] => 1591
    [17] => 1592
    [18] => 1593
    [19] => 1594
    [20] => 1601
    [21] => 1602
    [22] => 1705
    [23] => 1711
    [24] => 1604
    [25] => 1605
    [26] => 1606
    [27] => 1608
    [28] => 1607
    [29] => 1740
)

It clearly shows that the characters are not in proper order and needs to be sorted. I don’t know Persian, so I’m unable to determine whether or not there’s a fault in the UTF-8 Persian alphabet. But all I can say is that PHP is doing its work correctly.

like image 144
Sharanya Dutta Avatar answered Oct 13 '22 00:10

Sharanya Dutta


well to get the available locales you can use

print_r(ResourceBundle::getLocales(''));

I had both 'fa' and 'fa_IR' available, however 'fa_IR' was still returning false so I used 'fa' to test it:

setlocale(LC_ALL, 'fa');
asort($arr, SORT_LOCALE_STRING);
var_dump($arr);

but this was not still sorting in the proper order for me...

so after abit of more googling, the solution that has finally worked for me to sort Unicode Persian alphabets was using the Collator class:

$col = new \Collator('fa_IR');
$col->asort($arr);
var_dump($arr);

I know the question is old but this might still be helping the new people getting here looking for an answer to this question.

like image 44
Saeid Avatar answered Oct 12 '22 22:10

Saeid