Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting Special Characters in Javascript (Æ)

I'm trying to sort an array of objects based on the objects' name property. Some of the names start with 'Æ', and I'd like for them to be sorted as though they were 'Ae'. My current solution is the following:

myArray.sort(function(a, b) {
  var aName = a.name.replace(/Æ/gi, 'Ae'),
      bName = b.name.replace(/Æ/gi, 'Ae');
  return aName.localeCompare(bName);
});

I feel like there should be a better way of handling this without having to manually replace each and every special character. Is this possible?

I'm doing this in Node.js if it makes any difference.

like image 678
dontGoPlastic Avatar asked Sep 09 '12 20:09

dontGoPlastic


1 Answers

There is no simpler way. Unfortunately, even the way described in the question is too simple, at least if portability is of any concern.

The localeCompare method is by definition implementation-dependent, and it usually depends on the UI language of the underlying operating system, though it may also differ between browsers (or other JavaScript implementations) in the same computer. It can be hard to find any documentation on it, so even if you aim at writing non-portable code, you might need to do a lot of testing to see which collation order is applied. Cf. to Sorting strings is much harder than you thought!

So to have a controlled and portable comparison, you need to code it yourself, unless you are lucky enough to find someone else’s code that happens to suit your needs. On the positive side, the case conversion methods are one of the few parts of JavaScript that are localization-ready: they apply Unicode case mapping rules, so e.g. 'æ'.toUpperCase() yields Æ in any implementation.

In general, sorting strings requires a complicated function that applies specific sorting rules as defined for a language or by some other rules, such as the Pan-European sorting rules (intended for multilingual content). But if we can limit ourselves to sorting rules that deal with just a handful of letters in addition to Ascii, we can use code like the following simplified sorting for German (extract from by book Going Global with JavaScript and Globalize.js):

String.prototype.removeUmlauts = function () {
  return this.replace(/Ä/g,'A').replace(/Ö/g,'O').replace(/Ü/g,'U');
}; 
function alphabetic(str1, str2) {
  var a = str1.toUpperCase().removeUmlauts();
  var b = str2.toUpperCase().removeUmlauts();
  return a < b ? -1 : a > b ? 1 : 0;
}

You could adds other mappings, like replace(/Æ/gi, 'Ae'), to this, after analyzing the characters that may appear and deciding how to deal with them. Removing diacritic marks (e.g. mapping É to E) is simplistic but often good enough, and surely better than leaving it to implementations to decide whether É is somewhere after Z. And at least you would get consistent results across implementations, and you would see what things go wrong and need fixing, instead of waiting for other users complain that your code sorts all wrong (in their environment).

like image 87
Jukka K. Korpela Avatar answered Sep 25 '22 06:09

Jukka K. Korpela