I need to convert large UTF-8 strings into ASCII. It should be reversible, and ideally a quick/lightweight algorithm.
How can I do this? I need the source code (using loops) or the JavaScript code. (should not be dependent on any platform/framework/library)
Edit: I understand that the ASCII representation will not look correct and would be larger (in terms of bytes) than its UTF-8 counterpart, since its an encoded form of the UTF-8 original.
You could use an ASCII-only version of Douglas Crockford's json2.js quote function. Which would look like this:
var escapable = /[\\\"\x00-\x1f\x7f-\uffff]/g,
meta = { // table of character substitutions
'\b': '\\b',
'\t': '\\t',
'\n': '\\n',
'\f': '\\f',
'\r': '\\r',
'"' : '\\"',
'\\': '\\\\'
};
function quote(string) {
// If the string contains no control characters, no quote characters, and no
// backslash characters, then we can safely slap some quotes around it.
// Otherwise we must also replace the offending characters with safe escape
// sequences.
escapable.lastIndex = 0;
return escapable.test(string) ?
'"' + string.replace(escapable, function (a) {
var c = meta[a];
return typeof c === 'string' ? c :
'\\u' + ('0000' + a.charCodeAt(0).toString(16)).slice(-4);
}) + '"' :
'"' + string + '"';
}
This will produce a valid ASCII-only, javascript-quoted of the input string
e.g. quote("Doppelgänger!")
will be "Doppelg\u00e4nger!"
To revert the encoding you can just eval the result
var encoded = quote("Doppelgänger!");
var back = JSON.parse(encoded); // eval(encoded);
Any UTF-8 string that is reversibly convertible to ASCII is already ASCII.
UTF-8 can represent any unicode character - ASCII cannot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With