Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode characters from charcode in javascript for charcodes > 0xFFFF

I need to get a string / char from a unicode charcode and finally put it into a DOM TextNode to add into an HTML page using client side JavaScript.

Currently, I am doing:

String.fromCharCode(parseInt(charcode, 16));

where charcode is a hex string containing the charcode, e.g. "1D400". The unicode character which should be returned is 𝐀, but a is returned! Characters in the 16 bit range (0000 ... FFFF) are returned as expected.

Any explanation and / or proposals for correction?

Thanks in advance!

like image 610
leemes Avatar asked Mar 27 '11 01:03

leemes


People also ask

How do you represent Unicode in JavaScript?

In Javascript, the identifiers and string literals can be expressed in Unicode via a Unicode escape sequence. The general syntax is \uXXXX , where X denotes four hexadecimal digits. For example, the letter o is denoted as '\u006F' in Unicode.

What is charCode in JavaScript?

The charCode property returns the Unicode character code of the key that triggered the onkeypress event. The Unicode character code is the number of a character (e.g. the number "97" represents the letter "a").

Does JavaScript use UTF 8 or UTF-16?

UTF-16 is used by systems such as the Microsoft Windows API, the Java programming language and JavaScript/ECMAScript. It is also sometimes used for plain text and word-processing data files on Microsoft Windows. It is rarely used for files on Unix-like systems.

What is Unicode of a character?

Unicode is an international character encoding standard that provides a unique number for every character across languages and scripts, making almost all characters accessible across platforms, programs, and devices.


2 Answers

String.fromCharCode can only handle code points in the BMP (i.e. up to U+FFFF). To handle higher code points, this function from Mozilla Developer Network may be used to return the surrogate pair representation:

function fixedFromCharCode (codePt) {
    if (codePt > 0xFFFF) {
        codePt -= 0x10000;
        return String.fromCharCode(0xD800 + (codePt >> 10), 0xDC00 + (codePt & 0x3FF));
    } else {
        return String.fromCharCode(codePt);
    }
}
like image 53
Anomie Avatar answered Oct 13 '22 00:10

Anomie


The problem is that characters in JavaScript are (mostly) UCS-2 encoded but can represent a character outside the Basic Multilingual Plane in JavaScript as a UTF-16 surrogate pair.

The following function is adapted from Converting punycode with dash character to Unicode:

function utf16Encode(input) {
    var output = [], i = 0, len = input.length, value;
    while (i < len) {
        value = input[i++];
        if ( (value & 0xF800) === 0xD800 ) {
            throw new RangeError("UTF-16(encode): Illegal UTF-16 value");
        }
        if (value > 0xFFFF) {
            value -= 0x10000;
            output.push(String.fromCharCode(((value >>>10) & 0x3FF) | 0xD800));
            value = 0xDC00 | (value & 0x3FF);
        }
        output.push(String.fromCharCode(value));
    }
    return output.join("");
}

alert( utf16Encode([0x1D400]) );
like image 36
Tim Down Avatar answered Oct 13 '22 01:10

Tim Down