How do I check equality of Unicode strings in Javascript?

Tags:

I have two strings in Javascript: "_strange_chars_µö¬é@zendesk.com.eml" (f1) and "_strange_chars_µö¬é@zendesk.com.eml" (f2). At first glance, they look identical (and, indeed, on StackOverflow, they may be; I'm not sure what happens when they are pasted into a form like this.) In my application, however,

f1[16] // ö
f2[16] // o
f1[17] // ¬
f2[17] // ̈

That is, where f1 uses the ö character, f2 uses an o and a diacritic ¨ as a separate character. What comparison can I do that will show these two strings to be "equal"?

557

asked Aug 17 '11 18:08

James A. Rosen

1 Answers

f1 uses the ö character, f2 uses an o and a diacritic ¨ as a separate character.

f1 is in Normal Form C (composed) and f2 in Normal Form D (decomposed). In general Normal Form C is the most common on Windows and the web, with the Unicode FAQ describing it as “the best form for general text”. Unfortunately the Apple world plumped for Normal Form D in order to be gratuitously different.

The strings are canonically equivalent by the rules of Unicode equivalence.

What comparison can I do that will show these two strings to be "equal"?

In general, you convert both strings to one Normal Form of your choosing and then compare them. For example in Python:

>>> import unicodedata
>>> a= u'\u00F6'  # ö composed
>>> b= u'o\u0308' # o then combining umlaut
>>> unicodedata.normalize('NFC', a)==unicodedata.normalize('NFC', b)
True

Similarly Java has the Normalizer class, .NET has String.Normalize, and may languages have bindings available to the ICU library which also offers this feature.

Unfortunately, JavaScript has no native Unicode normalisation ability. This means either:

doing it yourself, carting around large Unicode data tables to cover it all in JavaScript (see eg here for an example implementation); or
sending it back to the server-side (eg via XMLHttpRequest), where you've got a better-equipped language to do it.

answered Sep 23 '22 00:09

bobince

Related questions
                            
                                Error "validate_display:255 error 3008 (EGL_BAD_DISPLAY)" by Toutorial
                            
                                Node.js setTimeout not fired after system time change
                            
                                When using ES6 import statement, is there a way to protect against items being undefined?
                            
                                HTML5 number input - display as percentage instead of decimal
                            
                                What is the difference between the ‘cssRules’ and ‘rules’ objects?
                            
                                Checking if a module is already loaded in Webpack?
                            
                                Synchronize Data across multiple occasionally-connected-clients using EventSourcing (NodeJS, MongoDB, JSON)
                            
                                Why are the C# and ECMAScript ISO standards freely available, but not C/C++?
                            
                                Remove Webpack bootstrap from output file
                            
                                Adding external login with Identity Server 4 and ASP.NET Identity
                            
                                How can I reset active touch-event-listeners inside YouTube embed <iframe>?
                            
                                Download file not working
                            
                                Firebase Auth - How Long is Recent Login
                            
                                es6-module-loader cannot locate @angular/core in Angular 6
                            
                                React hooks useState not updating with onChange [duplicate]
                            
                                Passing an async function as a callback causes the error stack trace to be lost
                            
                                I am having a problem understanding the different behavior of $("button").click() and $("button")[0].click()
                            
                                Dealing with slow Electron startup
                            
                                Http Auth in a Firefox 3 bookmarklet
                            
                                Understanding how alert() impacts browser event loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I check equality of Unicode strings in Javascript?

Tags:

javascript

string

unicode

normalization

unicode-normalization

James A. Rosen

People also ask

1 Answers

bobince

Recent Activity

Donate For Us