Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare Unicode strings in Javascript?

When I wrote in JavaScript "Ł" > "Z" it returns true. In Unicode order it should be of course false. How to fix this? My site is using UTF-8.

like image 202
Tomasz Wysocki Avatar asked Sep 02 '10 19:09

Tomasz Wysocki


People also ask

Can we use == to compare strings in JavaScript?

In JavaScript, strings can be compared based on their “value”, “characters case”, “length”, or “alphabetically” order: To compare strings based on their values and characters case, use the “Strict Equality Operator (===)”.

How do I compare two strings in JavaScript?

To compare two strings in JavaScript, use the localeCompare() method. The method returns 0 if both the strings are equal, -1 if string 1 is sorted before string 2 and 1 if string 2 is sorted before string 1.

Are JavaScript strings Unicode?

In Javascript, the identifiers and string literals can be expressed in Unicode via a Unicode escape sequence. The general syntax is \uXXXX , where X denotes four hexadecimal digits. For example, the letter o is denoted as '\u006F' in Unicode.

How do you compare character strings?

In other words, strings are compared letter-by-letter. The algorithm to compare two strings is simple: Compare the first character of both strings. If the first character from the first string is greater (or less) than the other string's, then the first string is greater (or less) than the second.


3 Answers

You can use Intl.Collator or String.prototype.localeCompare, introduced by ECMAScript Internationalization API:

"Ł".localeCompare("Z", "pl");              // -1 new Intl.Collator("pl").compare("Ł","Z");  // -1 

-1 means that Ł comes before Z, like you want.

Note it only works on latest browsers, though.

like image 143
Oriol Avatar answered Sep 21 '22 22:09

Oriol


Here is an example for the french alphabet that could help you for a custom sort:

var alpha = function(alphabet, dir, caseSensitive){   return function(a, b){     var pos = 0,       min = Math.min(a.length, b.length);     dir = dir || 1;     caseSensitive = caseSensitive || false;     if(!caseSensitive){       a = a.toLowerCase();       b = b.toLowerCase();     }     while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }     return alphabet.indexOf(a.charAt(pos)) > alphabet.indexOf(b.charAt(pos)) ?       dir:-dir;   }; }; 

To use it on an array of strings a:

a.sort(   alpha('ABCDEFGHIJKLMNOPQRSTUVWXYZaàâäbcçdeéèêëfghiïîjklmnñoôöpqrstuûüvwxyÿz') ); 

Add 1 or -1 as the second parameter of alpha() to sort ascending or descending.
Add true as the 3rd parameter to sort case sensitive.

You may need to add numbers and special chars to the alphabet list

like image 37
Mic Avatar answered Sep 20 '22 22:09

Mic


You may be able to build your own sorting function using localeCompare() that - at least according to the MDC article on the topic - should sort things correctly.

If that doesn't work out, here is an interesting SO question where the OP employs string replacement to build a "brute-force" sorting mechanism.

Also in that question, the OP shows how to build a custom textExtract function for the jQuery tablesorter plugin that does locale-aware sorting - maybe also worth a look.

Edit: As a totally far-out idea - I have no idea whether this is feasible at all, especially because of performance concerns - if you are working with PHP/mySQL on the back-end anyway, I would like to mention the possibility of sending an Ajax query to a mySQL instance to have it sorted there. mySQL is great at sorting locale aware data, because you can force sorting operations into a specific collation using e.g. ORDER BY xyz COLLATE utf8_polish_ci, COLLATE utf8_german_ci.... those collations would take care of all sorting woes at once.

like image 40
Pekka Avatar answered Sep 20 '22 22:09

Pekka