Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chinese Sorting by Pinyin in Javascript with localeCompare?

I am facing quite a challenge here. I am to sort certain Chinese "expressions" by pinyin.

The question:
How could I sort by pinyin in Firefox?
Is there a way to sort properly in IE 9 and 10? (They are also to be supported by the website)

Example:

  1. 财经传讯公司
  2. 财经顾问
  3. 房地产及按揭

According to a translator agency, this is what the sort order of the words should be. The translations are as follows:

  1. Financial communication agencies
  2. Financial consultancies
  3. Real estate and mortgages

The pronanciations in latin alphabet:

  1. cai jing chuan xun gong si
  2. cai jing gu wen
  3. fang di chan ji an jie

String.localeCompare: MDN Docs

From what I understand I am to provide a 2nd argument to the String.localeCompare method that "tells" the method to sort by pinyin in BCP 47 format which should be zh-CN-u-co-pinyin.

So the full code should look like this:

var arr = [ "财经传讯公司", "财经顾问", "房地产及按揭"];
console.dir(arr.sort(function(a, b){
    return a.localeCompare(b, [ "zh-CN-u-co-pinyin" ]); 
}));

jsFiddle working example

I expected this to log to console the expressions in the order I entered them in the array but the output differs.

On FX 27, the order is: 3, 1, 2
In Chrome 33: 1, 2, 3
In IE 11: 1, 2, 3

Note:

Pinyin is the official phonetic system for transcribing the Mandarin pronunciations of Chinese characters into the Latin alphabet.

like image 406
Daniel V. Avatar asked Apr 07 '14 08:04

Daniel V.


People also ask

How to compare Chinese characters pinyin sort localecompare?

In general, people will use the following method for Chinese characters pinyin sort localeCompare () : with local specific order to compare two strings. This approach to pinyin sort is unreliable.

How do I sort a list of Chinese characters in Python?

In general, people will use the following method for Chinese characters pinyin sort var list= [' king ', 'a', 'li']; list.Sort (function (a, b) {return a.localeCompare (b); }); localeCompare () : with local specific order to compare two strings. This approach to pinyin sort is unreliable.

What is the difference between sort() and localecompare() in JavaScript?

The construction of .localeCompare () is different than .sort () because it is comparing a string against another string. Compared to .sort () which sorts an array in place. Without any options .localeCompare () is doing the same as the basic sort.


3 Answers

This works on Chrome:

const arr = ["博","啊","吃","世","中","超"]
arr.sort((x,y)=>x.localeCompare(y, 'zh-CN'))
like image 135
soulmachine Avatar answered Oct 17 '22 01:10

soulmachine


In general, people will use the following method for Chinese characters pinyin sort

var list=[' king ', 'a', 'li'];  
list.Sort(function (a, b) {return a.localeCompare(b); });

localeCompare () : with local specific order to compare two strings.

This approach to pinyin sort is unreliable.

Second way: very dependent on Chinese operating system

Is very dependent on the browser kernel that is to say, if your site visitors are through the Chinese system, or the Internet explorer browser (Chrome), then he will probably unable to see the pinyin sort the result we expected.

Here I'll introduce my solution to this problem, hope to be able to derive somehow: this method supports the Unicode character set x4e00 from 0 to 0 x9fa5 area a total of 20902 consecutive from China (including Taiwan), Japan, South Korea, Chinese characters, namely, CJK (Chinese Japanese Korean) characters.

var CompareStrings={.........}
getOrderedUnicode: function (char) {
var originalUnicode=char.charCodeAt (); 
if (originalUnicode >=0 x4e00 && originalUnicode <=0 x9fa5) {
var index=this.Db.IndexOf (char); 
if (index >1) {
return index + 0 x4e00; 

}} 
return originalUnicode; 
}, 


compare: function (a, b) {
if (a==b) {return 0; }

//here can be rewritten according to the specific needs and the writing is the empty string at the bottom the if (a.length==0) {return 1; } 

if (b.length==0) {return - 1; } 
var count=a.length >B.length? B.length: a.length; 

for (var i=0; i<count; i++) {
var au=this.GetOrderedUnicode (a [i]); 
var bu=this.GetOrderedUnicode [i] (b); 
if (au >bu) {
return 1; 
} else if (au <bu) {
return - 1; 
}} 

return a.length >B.length? 1:1; 

}} 
//rewriting system native localeCompare 

The prototype:

LocaleCompare = function (param) {
    return CompareStrings.compare said (enclosing the toString (), param); 
} 

you can through the links below to download the complete code

A brief introduction of the principle of implementation:

  1. According to pinyin sort good character (db) : there are multiple ways to achieve a goal, I am done with JavaScript + c# combination, use the script first put all the enumeration of Chinese characters, and then submitted to the c #good background sort, and output to the front desk, this is just the preparation, what all can.

  2. Identify two characters who is bigger (getOrderedUnicode) : because when ordering, not only to deal with Chinese characters, and Chinese characters outside of the characters, so the comparator must be able to identify all of the characters, we here by judging whether a character is to discriminate Chinese characters: if it is Chinese characters, then the sort good word library search index, the index value plus the Unicode character set the location of the first Chinese characters, is after the "calibration" of the Unicode character set of the index value; If not Chinese characters, then return it directly on the index value of the Unicode character set.

  3. Compare two strings (compare) : by comparing two each of the characters (within the effective range comparison, that is, the shorter the length of the string), if you find a greater than b, it returns 1, vice return 1.

  4. Within the effective range after the comparison if haven't the tie, just see who is longer, such as a='123', b='1234', so long b to row in the back.

EDIT

You can also use JQuery plugin:

jQuery.extend( jQuery.fn.dataTableExt.oSort, {
    "chinese-string-asc" : function (s1, s2) {
        return s1.localeCompare(s2);
    },
    "chinese-string-desc" : function (s1, s2) {
        return s2.localeCompare(s1);
    }
} );

See the original post.

like image 2
sharkbait Avatar answered Oct 17 '22 01:10

sharkbait


According to MDN, locales and options arguments in localeCompare() have been added in Firefox 29. You should be able to sort by pinyin now.

like image 1
Xhacker Liu Avatar answered Oct 17 '22 02:10

Xhacker Liu