To express, for example, the character U+10400 in JavaScript, I use "\uD801\uDC00"
or String.fromCharCode(0xD801) + String.fromCharCode(0xDC00)
. How do I figure that out for a given unicode character? I want the following:
var char = getUnicodeCharacter(0x10400);
How do I find 0xD801
and 0xDC00
from 0x10400
?
In Javascript, the identifiers and string literals can be expressed in Unicode via a Unicode escape sequence. The general syntax is \uXXXX , where X denotes four hexadecimal digits. For example, the letter o is denoted as '\u006F' in Unicode.
Most JavaScript engines use UTF-16 encoding, so let's detail into UTF-16. UTF-16 (the long name: 16-bit Unicode Transformation Format) is a variable-length encoding: Code points from BMP are encoded using a single code unit of 16-bit. Code points from astral planes are encoded using two code units of 16-bit each.
Utf-8 and utf-16 are character encodings that each handle the 128,237 characters of Unicode that cover 135 modern and historical languages. Unicode is a standard and utf-8 and utf-16 are implementations of the standard. While Unicode is currently 128,237 characters it can handle up to 1,114,112 characters.
UTF-16 (16-bit Unicode Transformation Format) is an extension of UCS-2 that allows representing code points outside the BMP. It produces a variable-length result of either one or two 16-bit code units per code point. This way, it can encode code points in the range from 0 to 0x10FFFF .
Based on the wikipedia article given by Henning Makholm, the following function will return the correct character for a code point:
function getUnicodeCharacter(cp) {
if (cp >= 0 && cp <= 0xD7FF || cp >= 0xE000 && cp <= 0xFFFF) {
return String.fromCharCode(cp);
} else if (cp >= 0x10000 && cp <= 0x10FFFF) {
// we substract 0x10000 from cp to get a 20-bits number
// in the range 0..0xFFFF
cp -= 0x10000;
// we add 0xD800 to the number formed by the first 10 bits
// to give the first byte
var first = ((0xffc00 & cp) >> 10) + 0xD800
// we add 0xDC00 to the number formed by the low 10 bits
// to give the second byte
var second = (0x3ff & cp) + 0xDC00;
return String.fromCharCode(first) + String.fromCharCode(second);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With