Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting ArrayBuffer to String then back to ArrayBuffer using TextDecoder/TextEncoder returning a different result

I have an ArrayBuffer which is returned by reading memory using Frida. I'm converting the ArrayBuffer to a string, then back to an ArrayBuffer using TextDecoder and TextEncoder, however the result is being altered in the process. The ArrayBuffer length after decoding and re-encoding always comes out larger. Is there a character decoding in an expansive fashion?

How can I decode an ArrayBuffer to a String, then back to an ArrayBuffer without losing integrity?

Example code:

var arrayBuff = Memory.readByteArray(pointer,2000); //Get a 2,000 byte ArrayBuffer

console.log(arrayBuff.byteLength); //Always returns 2,000

var textDecoder = new TextDecoder("utf-8");
var textEncoder = new TextEncoder("utf-8");

//Decode and encode same data without making any changes
var decoded = textDecoder.decode(arrayBuff);
var encoded = textEncoder.encode(decoded);

console.log(encoded.byteLength); //Fluctuates between but always greater than 2,000
like image 909
Hem Avatar asked May 06 '18 09:05

Hem


1 Answers

TextDecoder and TextEncoder are designed to work with text. To convert an arbitrary byte sequence into a string and back, it's best to treat each byte as a single character.

var arrayBuff = Memory.readByteArray(pointer,2000); //Get a 2,000 byte ArrayBuffer

console.log(arrayBuff.byteLength); //Always returns 2,000

//Decode and encode same data without making any changes
var decoded = String.fromCharCode(...new Uint8Array(arrayBuff));
var encoded = Uint8Array.from([...decoded].map(ch => ch.charCodeAt())).buffer;

console.log(encoded.byteLength);

The decoded string will have exactly the same length as the input buffer and can be easily manipulated with regular expression, string methods, etc. But beware that Unicode characters that occupy two or more bytes in memory (e.g. "π") won't be recognizable anymore, as they will result in the concatenation of the characters corresponding to the code of each individual byte.

like image 175
GOTO 0 Avatar answered Oct 05 '22 19:10

GOTO 0