Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sort() for Japanese

If I have set my current locale to Japanese, how can I make it so that Japanese characters will always have higher preference than non-Japanese characters. For example, right now English characters will always appear before the Katakana characters. How can I reverse this effect?

Sorry for not being very clear. As you can see here.

The final results have Java, NVIDIA and Windows ファイアウォール. Ranked as the first three ahead of the Japanese characters. Is it possible to have those at the end?

like image 612
hao Avatar asked Apr 18 '11 03:04

hao


2 Answers

Use usort() instead of sort() so you can define comparing criteria at your own way.

Try this simple method. I have tried it with example from here, and it works.

  function mccompare($a, $b) {
    $fca = ord(substr($a, 0, 1)); $fcb = ord(substr($b, 0, 1));
    if (($fca >= 127 && $fcb >= 127) || ($fca < 127 && $fcb < 127))
      $res = $a > $b ? 1 : -1; 
    else 
      $res = $a > $b ? -1 : 1;
    return $res;
    }

  usort ($your_array, "mccompare");

So for this example

  setlocale(LC_COLLATE, "jpn");

  $your_array = array ("システム", "画面", "Windows ファイウォール",
      "インターネット オプション",  "キーボード", "メール", "音声認識", "管理ツール",
      "自動更新", "日付と時刻", "タスク", "プログラムの追加と削除", "フォント",
      "電源オプション", "マウス", "地域と言語オプション", "電話とモデムのオプション",
      "Java", "NVIDIA");

  usort ($your_array, "mccompare");
  print_r($your_array);

it returns array like

Array
(
    [0] => インターネット オプション
    [1] => キーボード
    [2] => システム
    [3] => タスク
    [4] => フォント
    [5] => プログラムの追加と削除
    [6] => マウス
    [7] => メール
    [8] => 地域と言語オプション
    [9] => 日付と時刻
    [10] => 画面
    [11] => 管理ツール
    [12] => 自動更新
    [13] => 電源オプション
    [14] => 電話とモデムのオプション
    [15] => 音声認識
    [16] => Java
    [17] => NVIDIA
    [18] => Windows ファイウォール
)

Note: This is just my quick solution for this problem, and it's not a perfect solution. It's based on checking first byte in comparing strings, but you can always push some effort in it and improve this function to check all multi-byte characters against Unicode and then decide if $a<=$b or $a>$b.

Hope it works for you!

like image 64
Wh1T3h4Ck5 Avatar answered Sep 23 '22 04:09

Wh1T3h4Ck5


Ultimately, PHP's sort() leaves it to the underlying libc to implement the sort. And as shown in the article and my comment, not all libcs sort the same way. If you need to present a consistent collation then you will need to use something such as Collator which uses a third-party library instead.

like image 41
Ignacio Vazquez-Abrams Avatar answered Sep 26 '22 04:09

Ignacio Vazquez-Abrams