Who performs unicode normalization and when?

Tags:

unicode

According to JavaScript - The Definitive guide,

JavaScript assumes that the source code it is interpreting has already been normalized and makes no attempt to normalize identifiers, strings, or regular expressions itself.

The Unicode standard defines the preferred encoding for all characters and specifies a normalization procedure to convert text to a canonical form suitable for comparisons.

If JS does not normalize Unicode then who does it and when?

If JavaScript does not normalize Unicode, then how is

Click to copy

"café" === "caf\u00e9"   // => true

and why is

Click to copy

"café" === "cafe\u0301"   // => false

Since both (\u00e9 and e\u0301) are Unicode ways to form é.

968

asked Jul 23 '17 18:07

1 Answers

You are confusing unicode normalization and string escaping.

Click to copy

"café"

…is the string made of characters with code points 0x63, 0x61, 0x66, 0xe9.

You can get the exact same string by using the escaped representation

Click to copy

"caf\u00e9"
// or even
"\u0063\u0061\u0066\u00e9"
// or why not
"\u0063\u0061fé"

When reading such string, javascript un-escapes the string. That is, it replaces the escape sequence by the matching characters. It is the exact same process that replaces "\n" with a new line.

Now, your second example is actually another string since it is not normalized. It is a string made of characters 0x63, 0x61, 0x66, 0x65, 0x301. As no normalization happens, it is not the same string.

Now try with the same string, using that sequence, which you cannot type with your keyboard, but that I copy-paste here for you: "café". Test it now:

Click to copy

> a = "café"     // this one is copy-pasted with the combining acute
> b = "café"     // this one is typed using the "é" key on my keyboard
> a === "cafe\u0301"
<- true
> b === "cafe\u0301"
<- false
> a === "caf\u00e9"
<- false
> b === "caf\u00e9"
<- true
> a === b
<- false
// Now just making sure...
> a.length
<- 5
> b.length
<- 4

The fact that "café" and "café" are rendered the same does not make them the same string. JavaScript compares the strings, finds that 0x63, 0x61, 0x66, 0xe9 is not the same as 0x63, 0x61, 0x66, 0x65, 0x301 and returns false.

109

answered Oct 04 '22 09:10

spectras

Related questions
                            
                                How to use react ref to get value from html select element?
                            
                                Node Js Express Validator required field only if another have a specific value
                            
                                Textarea to ignore enter but need to trigger Save button
                            
                                Bug with transitionend event not correctly removing a CSS class
                            
                                Ionic 2 custom svg spinner in loader
                            
                                What is the return type of resolve and reject function of a Promise?
                            
                                React-native currency input
                            
                                False error by eslint-plugin-import for webpack aliases
                            
                                Shopify Buy Button Minimum Quantity
                            
                                How to allow specific domain to access cloud functions
                            
                                How to store object in mongoose schema?
                            
                                Not able to delete selected polygon in ui-gmap-google-map
                            
                                How to call a WCF service from javascript?
                            
                                Why am I getting "unused default export" error?
                            
                                Cannot read property 'call' of undefined Webpack Bootstrap React
                            
                                leaflet: how to disable zoom event after using boxzoom (shift + move mouse)
                            
                                How to get providers access tokens from Firebase authenticated user?
                            
                                JS number function adds up zeros at the end [duplicate]
                            
                                How to update javascript array if item exists in that index position?
                            
                                onmousemove event does not fire from within external source?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Who performs unicode normalization and when?

Tags:

javascript

unicode

Harshit Juneja

People also ask

1 Answers

spectras

Recent Activity

Donate For Us