Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting from a Uint8Array to a string and back

I'm having an issue converting from a particular Uint8Array to a string and back. I'm working in the browser and in Chrome which natively supports the TextEncoder/TextDecoder modules.

If I start with a simple case, everything seems to work well:

const uintArray = new TextEncoder().encode('silly face demons'); // Uint8Array(17) [115, 105, 108, 108, 121, 32, 102, 97, 99, 101, 32, 100, 101, 109, 111, 110, 115] new TextDecoder().decode(uintArray); // silly face demons

But the following case is not giving me the results I expect. Without getting into too much of the details (it's cryptography related), let's start with the fact that I'm provided with the following Uint8Array:

Uint8Array(24) [58, 226, 7, 102, 202, 238, 58, 234, 217, 17, 189, 208, 46, 34, 254, 4, 76, 249, 169, 101, 112, 102, 140, 208]

and what I want to do is to convert that to a string and then later decrypt the string back to the original array, but I get this:

const uintArray = new Uint8Array([58, 226, 7, 102, 202, 238, 58, 234, 217, 17, 189, 208, 46, 34, 254, 4, 76, 249, 169, 101, 112, 102, 140, 208]); new TextDecoder().decode(uint8Array); // :�f��:����."�L��epf�� new TextEncoder().encode(':�f��:����."�L��epf��');

...which results in: Uint8Array(48) [58, 239, 191, 189, 7, 102, 239, 191, 189, 239, 191, 189, 58, 239, 191, 189, 239, 191, 189, 17, 239, 191, 189, 239, 191, 189, 46, 34, 239, 191, 189, 4, 76, 239, 191, 189, 239, 191, 189, 101, 112, 102, 239, 191, 189, 239, 191, 189]

The array has doubled. Encoding is a bit out of my wheel house. Can anyone tell me why the array has doubled (I'm assuming it's an alternate representation of the original array...?). Also, and more importantly, is there a way I could get back to the original array (i.e. undouble the one I'm getting)?

like image 253
robmisio Avatar asked Mar 05 '23 19:03

robmisio


2 Answers

You have code points in the array that you are trying to convert to utf-8 that don't make sense or are not allowed. Pretty much everything >= 128 requires special handling. Some of these are allowed but are leading bytes for multiple byte sequences and some like 254 are just not allowed. If you want to convert back and forth you will need to make sure you are creating valid utf-8. The codepage layout here might be useful: https://en.wikipedia.org/wiki/UTF-8#Codepage_layout as might the description of illegal byte sequences: https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences.

As a concrete example, this:

let arr = new TextDecoder().decode(new Uint8Array([194, 169]))
let res = new TextEncoder().encode(arr) // => [194, 168]

works because [194, 169] is valid utf-8 for © but:

let arr = new TextDecoder().decode(new Uint8Array([194, 27]))
let res = new TextEncoder().encode(arr) // => [239, 191, 189, 27]

doesn't because it's not a valid sequence.

like image 107
Mark Avatar answered Mar 27 '23 22:03

Mark


To get string from Uint8Array and back:

var u8arr = new Uint8Array([34, 128, 255]);
var u8str = u8arr.toString();  // Convert Uint8Array to String
console.log(u8str);
var u8arr2 = Uint8Array.from(u8str.split(',').map(x=>parseInt(x,10)));
console.log(u8arr2);  // back to Uint8Array

This does not suffer from utf-8 issues.

like image 42
arun Avatar answered Mar 27 '23 20:03

arun