Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript "🚀".charCodeAt(0) stuck at 55357?

the following doesn't seem correct

"🚀".charCodeAt(0);  // returns 55357 in both Firefox and Chrome

that's a Unicode character named ROCKET (U+1F680), the decimal should be 128640.

this is for a unicode app am writing. Seems most but not ALL chars from unicode 6 all stuck at 55357.

how can I fix it? Thanks.

like image 726
Xah Lee Avatar asked Mar 03 '13 02:03

Xah Lee


2 Answers

JavaScript is using UTF-16 encoding; see this article for details:

Characters outside the BMP, e.g. U+1D306 tetragram for centre (𝌆), can only be encoded in UTF-16 using two 16-bit code units: 0xD834 0xDF06. This is called a surrogate pair. Note that a surrogate pair only represents a single character.

The first code unit of a surrogate pair is always in the range from 0xD800 to 0xDBFF, and is called a high surrogate or a lead surrogate.

The second code unit of a surrogate pair is always in the range from 0xDC00 to 0xDFFF, and is called a low surrogate or a trail surrogate.

You can decode the surrogate pair like this:

codePoint = (text.charCodeAt(0) - 0xD800) * 0x400 + text.charCodeAt(1) - 0xDC00 + 0x10000

Complete code can be found can be found in the Mozilla documentation for charCodeAt.

like image 111
Daniel Avatar answered Nov 07 '22 04:11

Daniel


Tried this out:

> "🚀".charCodeAt(0);
55357

> "🚀".charCodeAt(1);
56960

Related questions on SO:

  • Expressing UTF-16 unicode characters in JavaScript
  • Unicode characters from charcode in javascript for charcodes > 0xFFFF

You might want to take a look at this too:

  • Getting it to work with higher values
like image 32
Samuel Liew Avatar answered Nov 07 '22 05:11

Samuel Liew