JavaScript strings - UTF-16 vs UCS-2?

Tags:

utf-16

I've read in some places that JavaScript strings are UTF-16, and in other places they're UCS-2. I did some searching around to try to figure out the difference and found this:

Q: What is the difference between UCS-2 and UTF-16?

A: UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1, before surrogate code points and UTF-16 were added to Version 2.0 of the standard. This term should now be avoided.

UCS-2 does not define a distinct data format, because UTF-16 and UCS-2 are identical for purposes of data exchange. Both are 16-bit, and have exactly the same code unit representation.

Sometimes in the past an implementation has been labeled "UCS-2" to indicate that it does not support supplementary characters and doesn't interpret pairs of surrogate code points as characters. Such an implementation would not handle processing of character properties, code point boundaries, collation, etc. for supplementary characters.

via: http://www.unicode.org/faq/utf_bom.html#utf16-11

So my question is, is it because the JavaScript string object's methods and indexes act on 16-bit data values instead of characters what make some people consider it UCS-2? And if so, would a JavaScript string object oriented around characters instead of 16-bit data chunks be considered UTF-16? Or is there something else I'm missing?

Edit: As requested, here are some sources saying JavaScript strings are UCS-2:

http://blog.mozilla.com/nnethercote/2011/07/01/faster-javascript-parsing/ http://terenceyim.wordpress.com/tag/ucs2/

EDIT: For anyone who may come across this, be sure to check out this link:

http://mathiasbynens.be/notes/javascript-encoding

788

asked Jan 03 '12 17:01

patorjk

2 Answers

JavaScript, strictly speaking, ECMAScript, pre-dates Unicode 2.0, so in some cases you may find references to UCS-2 simply because that was correct at the time the reference was written. Can you point us to specific citations of JavaScript being "UCS-2"?

Specifications for ECMAScript versions 3 and 5 at least both explicitly declare a String to be a collection unsigned 16-bit integers and that if those integer values are meant to represent textual data, then they are UTF-16 code units. See section 8.4 of the ECMAScript Language Specification.

EDIT: I'm no longer sure my answer is entirely correct. See the excellent article mentioned above, http://mathiasbynens.be/notes/javascript-encoding, which in essence says that while a JavaScript engine may use UTF-16 internally, and most do, the language itself effectively exposes those characters as if they were UCS-2.

answered Sep 19 '22 16:09

dgvid

It's UTF-16/USC-2. It can handle surrogate pairs, but the charAt/charCodeAt returns a 16-bit char and not the Unicode codepoint. If you want to have it handle surrogate pairs, I suggest a quick read through this.

answered Sep 20 '22 16:09

Daniel Moses

Related questions
                            
                                Compile an npm module into a single file, without dependencies
                            
                                Bootstrap dropdown checkbox select
                            
                                AngularJS: lazy loading controllers and content
                            
                                Angular 2: Difference between service and redux
                            
                                How to spy on a class constructor jest?
                            
                                setInterval timing slowly drifts away from staying accurate
                            
                                Infinity vs Number.POSITIVE_INFINITY
                            
                                WebRTC: How to add stream after offer and answer?
                            
                                Pass extra parameters to jquery ajax promise callback [duplicate]
                            
                                is there a way to implement promises in ie9+
                            
                                JavaScript ES6: Test for arrow function, built-in function, regular function?
                            
                                Should TypeScript Interfaces Be Defined in *.d.ts Files
                            
                                Whats the meaning of 'static get' in Javascript (ES6)? [duplicate]
                            
                                Failed to execute 'postMessage' on 'Window' GoogleTagManager
                            
                                JS file gets a net::ERR_ABORTED 404 (Not Found)
                            
                                Add to browser favorites/bookmarks from JavaScript but for all browsers (mine doesn't work in Chrome)?
                            
                                How to get content of <noscript> in Javascript in IE7?
                            
                                What is the limit on the length of a javascript property?
                            
                                Integrating CoffeeScript with Eclipse?
                            
                                What is the best way to implement a forced page refresh using Flask?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With