Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

encode string as utf-16 to base64 in javascript

I'm struggling to find any resources on this online, which is concerning. I've been reading about UCS-2 and UTF-16 woes, but I can't find a solution.

I need to get a value from an input:

var val = $('input').val()

and encode it to base64, treating the text as utf-16, so:

this is a test

becomes:

dABoAGkAcwAgAGkAcwAgAGEAIAB0AGUAcwB0AA==

and not the below, which you get treating it as UTF-8:

dGhpcyBpcyBhIHRlc3Q=
like image 871
Andrew Bullock Avatar asked Jun 30 '26 11:06

Andrew Bullock


1 Answers

Your data, once read into JavaScript, will be in an encodingless numerical format (strictly speaking, it has to be in Unicode Normalised Form C, but Unicode is just a series of identifying numbers for each glyph in the Unicode lexicon. It's encoding-less). So: if you specifically need the data encoded as a UTF-16 byte sequence, do so, then base64 encode that.

But here's the fun part: which UTF-16 do you need? Little or Big Endian? With or without BOM? UTF-16 is a really inconvenient encoding format (we're not even going to touch UCS-2. It's obsolete. Has been for a long time).

What you really should need is to get a text value from your HTML element, Base64 encode its value, and then have whatever receives that data unpack it as UTF8; don't try to make JavaScript do more work than it has to. I presume you're sending this data to a server or something, in which case: your server language is way more elaborate than JavaScript, and can unpack text in about a million different encodings thanks to built-in functions. So just use that. Don't solve Y for X.

like image 95
Mike 'Pomax' Kamermans Avatar answered Jul 02 '26 00:07

Mike 'Pomax' Kamermans