I'm having an issue converting from a particular Uint8Array to a string and back. I'm working in the browser and in Chrome which natively supports the TextEncoder/TextDecoder modules.
If I start with a simple case, everything seems to work well:
const uintArray = new TextEncoder().encode('silly face demons');
// Uint8Array(17) [115, 105, 108, 108, 121, 32, 102, 97, 99, 101, 32, 100, 101, 109, 111, 110, 115]
new TextDecoder().decode(uintArray); // silly face demons
But the following case is not giving me the results I expect. Without getting into too much of the details (it's cryptography related), let's start with the fact that I'm provided with the following Uint8Array:
Uint8Array(24) [58, 226, 7, 102, 202, 238, 58, 234, 217, 17, 189, 208, 46, 34, 254, 4, 76, 249, 169, 101, 112, 102, 140, 208]
and what I want to do is to convert that to a string and then later decrypt the string back to the original array, but I get this:
const uintArray = new Uint8Array([58, 226, 7, 102, 202, 238, 58, 234, 217, 17, 189, 208, 46, 34, 254, 4, 76, 249, 169, 101, 112, 102, 140, 208]);
new TextDecoder().decode(uint8Array); // :�f��:����."�L��epf��
new TextEncoder().encode(':�f��:����."�L��epf��');
...which results in:
Uint8Array(48) [58, 239, 191, 189, 7, 102, 239, 191, 189, 239, 191, 189, 58, 239, 191, 189, 239, 191, 189, 17, 239, 191, 189, 239, 191, 189, 46, 34, 239, 191, 189, 4, 76, 239, 191, 189, 239, 191, 189, 101, 112, 102, 239, 191, 189, 239, 191, 189]
The array has doubled. Encoding is a bit out of my wheel house. Can anyone tell me why the array has doubled (I'm assuming it's an alternate representation of the original array...?). Also, and more importantly, is there a way I could get back to the original array (i.e. undouble the one I'm getting)?
You have code points in the array that you are trying to convert to utf-8
that don't make sense or are not allowed. Pretty much everything >= 128
requires special handling. Some of these are allowed but are leading bytes for multiple byte sequences and some like 254
are just not allowed. If you want to convert back and forth you will need to make sure you are creating valid utf-8
. The codepage layout here might be useful: https://en.wikipedia.org/wiki/UTF-8#Codepage_layout as might the description of illegal byte sequences: https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences.
As a concrete example, this:
let arr = new TextDecoder().decode(new Uint8Array([194, 169]))
let res = new TextEncoder().encode(arr) // => [194, 168]
works because [194, 169]
is valid utf-8 for © but:
let arr = new TextDecoder().decode(new Uint8Array([194, 27]))
let res = new TextEncoder().encode(arr) // => [239, 191, 189, 27]
doesn't because it's not a valid sequence.
To get string from Uint8Array
and back:
var u8arr = new Uint8Array([34, 128, 255]);
var u8str = u8arr.toString(); // Convert Uint8Array to String
console.log(u8str);
var u8arr2 = Uint8Array.from(u8str.split(',').map(x=>parseInt(x,10)));
console.log(u8arr2); // back to Uint8Array
This does not suffer from utf-8 issues.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With