Please, look at this script operating on a (theoretically possible) string:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title></title>
<script src="jquery.js"></script>
<script>
$(function () {
$("#click").click(function () {
var txt = $('#high-unicode').text();
var codes = '';
for (var i = 0; i < txt.length; i++) {
if (i > 0) codes += ',';
codes += txt.charCodeAt(i);
}
alert(codes);
});
});
</script>
</head>
<body>
<span id="click">click</span><br />
<span id="high-unicode">𝑥<!-- mathematical italic small x -->󳇠<!-- some char from Supplementary Private Use Area-A -->A<!-- char A -->􈅱<!-- some char from Supplementary Private Use Area-B --></span>
</body>
</html>
Instead of "55349,56421,56204,56800,65,56288,56689", is it possible to get "119909,995808,65,1081713"? I've read more-utf-32-aware-javascript-string and Q: What’s the algorithm to convert from UTF-16 to character codes? + Q: Isn’t there a simpler way to do this? from unicode.org/faq/utf_bom, but I'm not sure how to use this info.
Unicode uses two encoding forms: 8-bit and 16-bit, based on the data type of the data that is being that is being encoded. The default encoding form is 16-bit, where each character is 16 bits (2 bytes) wide. Sixteen-bit encoding form is usually shown as U+hhhh, where hhhh is the hexadecimal code point of the character.
Characters within the ASCII range take only one byte while very unusual characters take four. UTF-32 uses four bytes per character regardless of what character it is, so it will always use more space than UTF-8 to encode the same string.
It looks like you have to decode surrogate pairs manually. For example:
function decodeUnicode(str) {
var r = [], i = 0;
while(i < str.length) {
var chr = str.charCodeAt(i++);
if(chr >= 0xD800 && chr <= 0xDBFF) {
// surrogate pair
var low = str.charCodeAt(i++);
r.push(0x10000 + ((chr - 0xD800) << 10) | (low - 0xDC00));
} else {
// ordinary character
r.push(chr);
}
}
return r;
}
Complete code: http://jsfiddle.net/twQWU/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With