Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JavaScript encoding with Special characters

I wanted to write a method to escape special chars like 'ä' to their responding Unicode (e.g. \u00e4).

For some reason JS finds it amusing to not even save the 'ä' internally but use 'üÜ' or some other garble, so when I convert it spits out '\u00c3\u00b6\u00c3\u002013' because it converts these chars instead of 'ä'.

I have tried setting the HTML file's encoding to utf-8 and tried loading the scripts with charset="UTF-8" to no avail. The code doesn't really do anything special but here it is:

String.prototype.replaceWithUtf8 = function() {
    var str_newString = '';
    var str_procString = this;

    for (var i = 0; i < str_procString.length; i++) {
        if (str_procString.charCodeAt(i) > 126) {
            var hex_uniCode = '\\u00' + str_procString.charCodeAt(i).toString(16);
            console.log(hex_uniCode + " (" + str_procString.charAt(i) + ")");
            str_newString += hex_uniCode;
        } else {
            str_newString += str_procString.charAt(i);
        }
    }
    return str_newString;
}
var str_item = "Lärm, Lichter, Lücken, Löcher."

console.log(str_item); // Lärm, Lichter, Lücken, Löcher. 
console.log(str_item.replaceWithUtf8()); //L\u00c3\u00a4rm, Lichter, L\u00c3\u00bccken, L\u00c3\u00b6cher. 
like image 603
ProudOne Avatar asked Nov 06 '12 09:11

ProudOne


People also ask

How do you handle special characters in JavaScript?

JavaScript allows us to add special characters to a text String using a backslash (\) sign. We can add different types of special characters, including the single quote, double quote, ampersand, new line, tab, backspace, form feed, etc., using the backslash just before the characters.

How do I encode a character in JavaScript?

In order to encode/decode a string in JavaScript, We are using built-in functions provided by JavaScript. btoa(): This method encodes a string in base-64 and uses the “A-Z”, “a-z”, “0-9”, “+”, “/” and “=” characters to encode the provided string.

Does JavaScript use UTF-8 or UTF-16?

UTF-16 is used by systems such as the Microsoft Windows API, the Java programming language and JavaScript/ECMAScript. It is also sometimes used for plain text and word-processing data files on Microsoft Windows. It is rarely used for files on Unix-like systems.

How do you escape a special character in JavaScript?

To use a special character as a regular one, prepend it with a backslash: \. . That's also called “escaping a character”.


2 Answers

I have no idea how or why but I just restarted the server again and now it's displaying correctly. To follow up; here's the code for everyone who's interested:

String.prototype.replaceWithUtf8 = function() {
    var str_newString = '';
    var str_procString = this;
    var arr_replace = new Array('/', '"');
    var arr_replaceWith = new Array('\\/', '\\"');

    for (var i = 0; i < str_procString.length; i++) {
        var int_charCode = str_procString.charCodeAt(i);
        var cha_charAt = str_procString.charAt(i);
        var int_chrIndex = arr_replace.indexOf(cha_charAt);

        if (int_chrIndex > -1) {
            console.log(arr_replaceWith[int_chrIndex]);
            str_newString += arr_replaceWith[int_chrIndex];
        } else {
            if (int_charCode > 126 && int_charCode < 65536) {
                var hex_uniCode = '\\u' + ("000" + int_charCode.toString(16)).substr(-4);
                console.log(hex_uniCode + " (" + cha_charAt + ")");
                str_newString += hex_uniCode;
            } else {
                str_newString += cha_charAt;
            }
        }
    }
    return str_newString;
}
like image 124
ProudOne Avatar answered Sep 25 '22 21:09

ProudOne


Use '\\u' + ('000' + str_procString.charCodeAt(i).toString(16) ).stubstr(-4); instead to get the right escape sequences - yours do always start with 00. Also, instead of a for-loop processing your string, .replace() might be faster.

On your question:

console.log("Lärm, Lichter, Lücken, Löcher."); // Lärm, Lichter, Lücken, Löcher.

does not sound as you really sent the file with the right encoding. Might be a server problem, too, if it is correctly saved already.

like image 23
Bergi Avatar answered Sep 22 '22 21:09

Bergi