Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JavaScript Unicode normalization

I'm under the impression that JavaScript interpreter assumes that the source code it is interpreting has already been normalized. What, exactly does the normalizing? It can't be the text editor, otherwise the plaintext representation of the source would change. Is there some "preprocessor" that does the normalization?

like image 571
Matty Avatar asked Oct 14 '11 19:10

Matty


People also ask

How do you normalize text in Javascript?

The string. normalize() is an inbuilt method in javascript which is used to return a Unicode normalisation form of a given input string. If the given input is not a string, then at first it will be converted into a string then this method will work.

Why do we normalize Unicode?

Unicode normalization converts the different representations to the same form so they can be compared. All conforming processors must support the NFC format. They are also free to support any or all of the other formats defined by Unicode, and they can support their own formats if they want.

What is Normalising a string?

Normalize() Returns a new string whose textual value is the same as this string, but whose binary representation is in Unicode normalization form C. Normalize(NormalizationForm)

What is NFD normalization?

NFD. Normalization Form Canonical Decomposition. Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.


1 Answers

ECMAScript 6 introduces String.prototype.normalize() which takes care of Unicode normalization for you.

unorm is a JavaScript polyfill for this method, so that you can already use String.prototype.normalize() today even though not a single engine supports it natively at the moment.

For more information on how and when to use Unicode normalization in JavaScript, see JavaScript has a Unicode problem – Accounting for lookalikes.

like image 192
Mathias Bynens Avatar answered Sep 22 '22 00:09

Mathias Bynens