I'm not sure what this is called so I'm having trouble searching for it. How can I decode a string with unicode from http\u00253A\u00252F\u00252Fexample.com
to http://example.com
with JavaScript? I tried unescape
, decodeURI
, and decodeURIComponent
so I guess the only thing left is string replace.
EDIT: The string is not typed, but rather a substring from another piece of code. So to solve the problem you have to start with something like this:
var s = 'http\\u00253A\\u00252F\\u00252Fexample.com';
I hope that shows why unescape() doesn't work.
A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.
In Python source code, Unicode literals are written as strings prefixed with the 'u' or 'U' character: u'abcdefghijk' . Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving the code point. The \U escape sequence is similar, but expects 8 hex digits, not 4.
According to section 3.3 of the Java Language Specification (JLS) a unicode escape consists of a backslash character (\) followed by one or more 'u' characters and four hexadecimal digits. So for example \u000A will be treated as a line feed.
If your source isn't a Unicode character ( char ) but a String, you must use charAt(index) to get the Unicode character at position index .
Edit (2017-10-12):
@MechaLynx and @Kevin-Weber note that unescape()
is deprecated from non-browser environments and does not exist in TypeScript. decodeURIComponent
is a drop-in replacement. For broader compatibility, use the below instead:
decodeURIComponent(JSON.parse('"http\\u00253A\\u00252F\\u00252Fexample.com"')); > 'http://example.com'
Original answer:
unescape(JSON.parse('"http\\u00253A\\u00252F\\u00252Fexample.com"')); > 'http://example.com'
You can offload all the work to JSON.parse
UPDATE: Please note that this is a solution that should apply to older browsers or non-browser platforms, and is kept alive for instructional purposes. Please refer to @radicand 's answer below for a more up to date answer.
This is a unicode, escaped string. First the string was escaped, then encoded with unicode. To convert back to normal:
var x = "http\\u00253A\\u00252F\\u00252Fexample.com"; var r = /\\u([\d\w]{4})/gi; x = x.replace(r, function (match, grp) { return String.fromCharCode(parseInt(grp, 16)); } ); console.log(x); // http%3A%2F%2Fexample.com x = unescape(x); console.log(x); // http://example.com
To explain: I use a regular expression to look for \u0025
. However, since I need only a part of this string for my replace operation, I use parentheses to isolate the part I'm going to reuse, 0025
. This isolated part is called a group.
The gi
part at the end of the expression denotes it should match all instances in the string, not just the first one, and that the matching should be case insensitive. This might look unnecessary given the example, but it adds versatility.
Now, to convert from one string to the next, I need to execute some steps on each group of each match, and I can't do that by simply transforming the string. Helpfully, the String.replace operation can accept a function, which will be executed for each match. The return of that function will replace the match itself in the string.
I use the second parameter this function accepts, which is the group I need to use, and transform it to the equivalent utf-8 sequence, then use the built - in unescape
function to decode the string to its proper form.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With