Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert large UTF-8 strings into ASCII?

I need to convert large UTF-8 strings into ASCII. It should be reversible, and ideally a quick/lightweight algorithm.

How can I do this? I need the source code (using loops) or the JavaScript code. (should not be dependent on any platform/framework/library)

Edit: I understand that the ASCII representation will not look correct and would be larger (in terms of bytes) than its UTF-8 counterpart, since its an encoded form of the UTF-8 original.

like image 354
Robin Rodricks Avatar asked Nov 28 '22 10:11

Robin Rodricks


2 Answers

You could use an ASCII-only version of Douglas Crockford's json2.js quote function. Which would look like this:

    var escapable = /[\\\"\x00-\x1f\x7f-\uffff]/g,
        meta = {    // table of character substitutions
            '\b': '\\b',
            '\t': '\\t',
            '\n': '\\n',
            '\f': '\\f',
            '\r': '\\r',
            '"' : '\\"',
            '\\': '\\\\'
        };

    function quote(string) {

// If the string contains no control characters, no quote characters, and no
// backslash characters, then we can safely slap some quotes around it.
// Otherwise we must also replace the offending characters with safe escape
// sequences.

        escapable.lastIndex = 0;
        return escapable.test(string) ?
            '"' + string.replace(escapable, function (a) {
                var c = meta[a];
                return typeof c === 'string' ? c :
                    '\\u' + ('0000' + a.charCodeAt(0).toString(16)).slice(-4);
            }) + '"' :
            '"' + string + '"';
    }

This will produce a valid ASCII-only, javascript-quoted of the input string

e.g. quote("Doppelgänger!") will be "Doppelg\u00e4nger!"

To revert the encoding you can just eval the result

var encoded = quote("Doppelgänger!");
var back = JSON.parse(encoded); // eval(encoded);
like image 164
fforw Avatar answered Dec 04 '22 23:12

fforw


Any UTF-8 string that is reversibly convertible to ASCII is already ASCII.

UTF-8 can represent any unicode character - ASCII cannot.

like image 38
Neall Avatar answered Dec 04 '22 21:12

Neall