Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change JavaScript string encoding

Tags:

At the moment I have a large JavaScript string I'm attempting to write to a file, but in a different encoding (ISO-8859-1). I was hoping to use something like downloadify. Downloadify only accepts normal JavaScript strings or base64 encoded strings.

Because of this, I've decided to compress my string using JSZip which generates a nicely base64 encoded string that can be passed to downloadify, and downloaded to my desktop. Huzzah! The issue is that the string I compressed, of course, is still the wrong encoding.

Luckily JSZip can take a Uint8Array as data, instead of a string. So is there any way to convert a JavaScript string into a ISO-8859-1 encoded string and store it in a Uint8Array?

Alternatively, if I'm approaching this all wrong, is there a better solution all together? Is there a fancy JavaScript string class that can use different internal encodings?

Edit: To clarify, I'm not pushing this string to a webpage so it won't automatically convert it for me. I'm doing something like this:

var zip = new JSZip();
zip.file("genSave.txt", result);

return zip.generate({compression:"DEFLATE"});

And for this to make sense, I would need result to be in the proper encoding (and JSZip only takes strings, arraybuffers, or uint8arrays).

Final Edit (This was -not- a duplicate question because the result wasn't being displayed in the browser or transmitted to a server where the encoding could be changed):

This turned out to be a little more obscure than I had thought, so I ended up rolling my own solution. It's not nearly as robust as a proper solution would be, but it'll convert a JavaScript string into windows-1252 encoding, and stick it in a Uint8Array:

var enc = new string_transcoder("windows-1252");
var tenc = enc.transcode(result); //This is now a Uint8Array

You can then either use it in the array like I did:

//Make this into a zip
var zip = new JSZip();   
zip.file("genSave.txt", tenc);   
return zip.generate({compression:"DEFLATE"});

Or convert it into a windows-1252 encoded string using this string encoding library:

var string = TextDecoder("windows-1252").decode(tenc);

To use this function, either use:

<script src="//www.eu4editor.com/string_transcoder.js"></script>

Or include this:

function string_transcoder (target) {

    this.encodeList = encodings[target];
    if (this.encodeList === undefined) {
        return undefined;
    }

    //Initialize the easy encodings
    if (target === "windows-1252") {
        var i;
        for (i = 0x0; i <= 0x7F; i++) {
            this.encodeList[i] = i;          
        }
        for (i = 0xA0; i <= 0xFF; i++) {
            this.encodeList[i] = i;
        }
    }

}

string_transcoder.prototype.transcode = function (inString) {


    var res = new Uint8Array(inString.length), i;


    for (i = 0; i < inString.length; i++) {
        var temp = inString.charCodeAt(i);
        var tempEncode = (this.encodeList)[temp];
        if (tempEncode === undefined) {
            return undefined; //This encoding is messed up
        } else {
            res[i] = tempEncode;
        }
    }

    return res;
};

encodings = {

    "windows-1252": {0x20AC:0x80, 0x201A:0x82, 0x0192:0x83, 0x201E:0x84, 0x2026:0x85, 0x2020:0x86, 0x2021:0x87, 0x02C6:0x88, 0x2030:0x89, 0x0160:0x8A, 0x2039:0x8B, 0x0152:0x8C, 0x017D:0x8E, 0x2018:0x91, 0x2019:0x92, 0x201C:0x93, 0x201D:0x94, 0x2022:0x95, 0x2013:0x96, 0x2014:0x97, 0x02DC:0x98, 0x2122:0x99, 0x0161:0x9A, 0x203A:0x9B, 0x0153:0x9C, 0x017E:0x9E, 0x0178:0x9F}     

};
like image 887
David Avatar asked Sep 18 '13 18:09

David


People also ask

Are JavaScript strings UTF-8?

While a JavaScript source file can have any kind of encoding, JavaScript will then convert it internally to UTF-16 before executing it. JavaScript strings are all UTF-16 sequences, as the ECMAScript standard says: When a String contains actual textual data, each element is considered to be a single UTF-16 code unit.

How is JavaScript encoded?

Because Javascript was invented twenty years ago in the space of ten days, it uses an encoding that uses two bytes to store each character, which translates roughly to an encoding called UCS-2, or another one called UTF-16.

What is encodeURI in JavaScript?

The encodeURI() function encodes a URI by replacing each instance of certain characters by one, two, three, or four escape sequences representing the UTF-8 encoding of the character (will only be four escape sequences for characters composed of two "surrogate" characters).

How do I encode a string to UTF-8 in Node?

nameString. toString("utf8");


2 Answers

This turned out to be a little more obscure than [the author] had thought, so [the author] ended up rolling [his] own solution. It's not nearly as robust as a proper solution would be, but it'll convert a JavaScript string into windows-1252 encoding, and stick it in a Uint8Array:

var enc = new string_transcoder("windows-1252");
var tenc = enc.transcode(result); //This is now a Uint8Array

You can then either use it in the array like [the author] did:

//Make this into a zip
var zip = new JSZip();   
zip.file("genSave.txt", tenc);   
return zip.generate({compression:"DEFLATE"});

Or convert it into a windows-1252 encoded string using this string encoding library:

var string = TextDecoder("windows-1252").decode(tenc);

To use this function, either use:

<script src="//www.eu4editor.com/string_transcoder.js"></script>

Or include this:

function string_transcoder (target) {

    this.encodeList = encodings[target];
    if (this.encodeList === undefined) {
        return undefined;
    }

    //Initialize the easy encodings
    if (target === "windows-1252") {
        var i;
        for (i = 0x0; i <= 0x7F; i++) {
            this.encodeList[i] = i;          
        }
        for (i = 0xA0; i <= 0xFF; i++) {
            this.encodeList[i] = i;
        }
    }

}

string_transcoder.prototype.transcode = function (inString) {


    var res = new Uint8Array(inString.length), i;


    for (i = 0; i < inString.length; i++) {
        var temp = inString.charCodeAt(i);
        var tempEncode = (this.encodeList)[temp];
        if (tempEncode === undefined) {
            return undefined; //This encoding is messed up
        } else {
            res[i] = tempEncode;
        }
    }

    return res;
};

encodings = {

    "windows-1252": {0x20AC:0x80, 0x201A:0x82, 0x0192:0x83, 0x201E:0x84, 0x2026:0x85, 0x2020:0x86, 0x2021:0x87, 0x02C6:0x88, 0x2030:0x89, 0x0160:0x8A, 0x2039:0x8B, 0x0152:0x8C, 0x017D:0x8E, 0x2018:0x91, 0x2019:0x92, 0x201C:0x93, 0x201D:0x94, 0x2022:0x95, 0x2013:0x96, 0x2014:0x97, 0x02DC:0x98, 0x2122:0x99, 0x0161:0x9A, 0x203A:0x9B, 0x0153:0x9C, 0x017E:0x9E, 0x0178:0x9F}     

};
like image 178
Nate Avatar answered Oct 29 '22 21:10

Nate


Test the following script:

<script type="text/javascript" charset="utf-8">
like image 22
user2511140 Avatar answered Oct 29 '22 21:10

user2511140