Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting in localeCompare in javascript

I am working on a Javascript code as shown below:

let arr = [
  '1 Hello',
  '2 Hello',
  '3 Hello',
  '4 Hello',
  ';1',
  'z',
  '%1',
  '110 Hello',
  '100 Hello',
  'a',
  'Z',
  '00',
  '21 Hello',
  '9  Hello',
  '13 Hello',
  '10000 Hello',
  '0 Hello',
  'A'
  ];


arr.sort( (a, b) => {
  return a.localeCompare(b, 'en', {
    numeric: true
  })
} ).forEach( ml => { console.log(ml) });

The above Javascript is printing the following o/p:

;1
%1
00
0 Hello
1 Hello
2 Hello
3 Hello
4 Hello
9  Hello
13 Hello
21 Hello
100 Hello
110 Hello
10000 Hello
a
A
z
Z
=> undefined

Problem Statement:

I am wondering why in the o/p ;1 is coming before %1 and how other strings are getting sorted here?

like image 582
john Avatar asked Oct 23 '18 04:10

john


1 Answers

tldr; Browser specific implementation. Recently "standardized". In my answer I suppose that the question is not about the .sort() array method in JS, but about outputs of the .localeCompare() string method. If we go to MDN description of the method https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare we can find the definition:

The localeCompare() method returns a number indicating whether a reference string comes before or after or is the same as the given string in sort order.

As well as a link to specification: https://www.ecma-international.org/ecma-262/6.0/#sec-string.prototype.localecompare . This specification has this excerpt:

When the localeCompare method is called with argument that, it returns a Number other than NaN that represents the result of a locale-sensitive String comparison of the this value (converted to a String) with that (converted to a String). The two Strings are S and That. The two Strings are compared in an implementation-defined fashion. The result is intended to order String values in the sort order specified by a host default locale...

You can find a "bug" report for Chromium "localeCompare implementation differs from other browsers": https://bugs.chromium.org/p/v8/issues/detail?id=459 , - with examples like:

in v8 version 1.3.13.5, i get these results for the final sort:
A,R,Z,a,q,z,ä,æ

safari produces:
A,a,ä,æ,q,R,Z,z

firefox produces:
a,A,ä,æ,q,R,z,Z

and one of the answers:

Marking as feature request as the current implementation does not violate the spec.

Currently AFAIK for comparison i18n Intl API specification is used. You can find more about the comparison rules in the specification: https://www.ecma-international.org/ecma-402/2.0/#collator-objects . This document also has this note on the comparison options:

Unicode Technical Standard 35 describes ten locale extension keys that are relevant to collation: "co" for collator usage and specializations, "ka" for alternate handling, "kb" for backward second level weight, "kc" for case level, "kn" for numeric, "kh" for hiragana quaternary, "kk" for normalization, "kf" for case first, "kr" for reordering, "ks" for collation strength, and "vt" for variable top.

So you can tweak the output to the certain levels. I hope this is helpful, thanks :)

like image 198
Georgy Avatar answered Oct 21 '22 06:10

Georgy