Bytes to string and backward
Functions written there work properly that is pack(unpack("string"))
yields to "string"
. But I would like to have the same result as "string".getBytes("UTF8")
gives in Java.
The question is how to make a function giving the same functionality as Java getBytes("UTF8") in JavaScript?
For Latin strings unpack(str)
from the article mentioned above provides the same result as getBytes("UTF8")
except it adds 0
for odd positions. But with non-Latin strings it works completely different as it seems to me. Is there a way to work with string data in JavaScript like Java does?
You don't need to write a full-on UTF-8 encoder; there is a much easier JS idiom to convert a Unicode string into a string of bytes representing UTF-8 code units:
unescape(encodeURIComponent(str))
(This works because the odd encoding used by escape
/unescape
uses %xx
hex sequences to represent ISO-8859-1 characters with that code, instead of UTF-8 as used by URI-component escaping. Similarly decodeURIComponent(escape(bytes))
goes in the other direction.)
So if you want an Array out it would be:
function toUTF8Array(str) {
var utf8= unescape(encodeURIComponent(str));
var arr= new Array(utf8.length);
for (var i= 0; i<utf8.length; i++)
arr[i]= utf8.charCodeAt(i);
return arr;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With