BMP being Basic Multilingual Plane According to JavaScript: the Good Parts: <blockquote> JavaScript was built at a time when Unicode was a 16-bit character set, so all characters in JavaScript are 16 bits wide. </blockquote> This leads me to believe that JavaScript uses UCS-2 (not UTF-16!) and can only handle characters up to U+FFFF. Further investigation confirms this: <pre class="prettyprint"><code>> String.fromCharCode(0x20001); </code></pre> The <code>fromCharCode</code> method seems to only use the lowest 16 bits when returning the Unicode character. Trying to get U+20001 (CJK unified ideograph 20001) instead returns U+0001. Question: is it at all possible to handle post-BMP characters in JavaScript? <hr> 2011-07-31: slide twelve from Unicode Support Shootout: The Good, The Bad, & the (mostly) Ugly covers issues related to this quite well: <img src="https://i.imgur.com/dLwbz.png">

Depends what you mean by ‘support’. You can certainly put non-UCS-2 characters in a JS string using surrogates, and browsers will display them if they can. But, each item in a JS string is a separate UTF-16 code unit. There is no language-level support for handling full characters: all the standard String members (<code>length</code>, <code>split</code>, <code>slice</code> etc) all deal with code units not characters, so will quite happily split surrogate pairs or hold invalid surrogate sequences. If you want surrogate-aware methods, I'm afraid you're going to have to start writing them yourself! For example: <pre class="prettyprint"><code>String.prototype.getCodePointLength= function() { return this.length-this.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g).length+1; }; String.fromCodePoint= function() { var chars= Array.prototype.slice.call(arguments); for (var i= chars.length; i-->0;) { var n = chars[i]-0x10000; if (n>=0) chars.splice(i, 1, 0xD800+(n>>10), 0xDC00+(n&0x3FF)); } return String.fromCharCode.apply(null, chars); }; </code></pre>

JavaScript strings outside of the BMP

Tags:

javascript

unicode

utf-16

surrogate-pairs

astral-plane

BMP being Basic Multilingual Plane

According to JavaScript: the Good Parts:

JavaScript was built at a time when Unicode was a 16-bit character set, so all characters in JavaScript are 16 bits wide.

This leads me to believe that JavaScript uses UCS-2 (not UTF-16!) and can only handle characters up to U+FFFF.

Further investigation confirms this:

> String.fromCharCode(0x20001);

The fromCharCode method seems to only use the lowest 16 bits when returning the Unicode character. Trying to get U+20001 (CJK unified ideograph 20001) instead returns U+0001.

Question: is it at all possible to handle post-BMP characters in JavaScript?

2011-07-31: slide twelve from Unicode Support Shootout: The Good, The Bad, & the (mostly) Ugly covers issues related to this quite well:

203

asked Sep 19 '10 06:09

Delan Azabani

1 Answers

Depends what you mean by ‘support’. You can certainly put non-UCS-2 characters in a JS string using surrogates, and browsers will display them if they can.

But, each item in a JS string is a separate UTF-16 code unit. There is no language-level support for handling full characters: all the standard String members (length, split, slice etc) all deal with code units not characters, so will quite happily split surrogate pairs or hold invalid surrogate sequences.

If you want surrogate-aware methods, I'm afraid you're going to have to start writing them yourself! For example:

String.prototype.getCodePointLength= function() {     return this.length-this.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g).length+1; };  String.fromCodePoint= function() {     var chars= Array.prototype.slice.call(arguments);     for (var i= chars.length; i-->0;) {         var n = chars[i]-0x10000;         if (n>=0)             chars.splice(i, 1, 0xD800+(n>>10), 0xDC00+(n&0x3FF));     }     return String.fromCharCode.apply(null, chars); };

110

answered Oct 03 '22 23:10

bobince

Related questions
                            
                                Authentication for users on a Single Page App?
                            
                                Is it OK to call clearInterval inside a setInterval handler?
                            
                                How to stretch images with no antialiasing
                            
                                iOS9: Try to open app via scheme if possible, or redirect to app store otherwise
                            
                                Capture keys typed on android virtual keyboard using javascript
                            
                                How can I say "love" without character or digits in JavaScript? [closed]
                            
                                Performance of jQuery.grep vs. Array.filter
                            
                                How to render a HTML comment in React?
                            
                                What's the point of document.defaultView?
                            
                                Is there any memory limit for Google Chrome browser?
                            
                                How to let react router respond with 404 status code?
                            
                                Insert a tab or spaces in html [duplicate]
                            
                                JavaScript - Parse UTC Date
                            
                                React error 'Failed propType: Invalid prop `children` supplied to `Provider`, expected a single ReactElement'
                            
                                Map using tuples or objects
                            
                                Angular 2 (click) and (dblclick) on the same element not working good? [duplicate]
                            
                                In Android Webview, am I able to modify a webpage's DOM?
                            
                                How does AngularJS get away with using custom HTML5 element tags and attributes?
                            
                                const App: () => React$Node = () => {...}: what does it mean this instruction?
                            
                                How are JavaScript arrays implemented?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With