Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I decode a string with escaped unicode?

I'm not sure what this is called so I'm having trouble searching for it. How can I decode a string with unicode from http\u00253A\u00252F\u00252Fexample.com to http://example.com with JavaScript? I tried unescape, decodeURI, and decodeURIComponent so I guess the only thing left is string replace.

EDIT: The string is not typed, but rather a substring from another piece of code. So to solve the problem you have to start with something like this:

var s = 'http\\u00253A\\u00252F\\u00252Fexample.com'; 

I hope that shows why unescape() doesn't work.

like image 200
styfle Avatar asked Oct 25 '11 05:10

styfle


People also ask

How do I escape Unicode?

A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.

How do you escape a Unicode character in Python?

In Python source code, Unicode literals are written as strings prefixed with the 'u' or 'U' character: u'abcdefghijk' . Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving the code point. The \U escape sequence is similar, but expects 8 hex digits, not 4.

How do you escape Unicode characters in Java?

According to section 3.3 of the Java Language Specification (JLS) a unicode escape consists of a backslash character (\) followed by one or more 'u' characters and four hexadecimal digits. So for example \u000A will be treated as a line feed.

How do I find Unicode?

If your source isn't a Unicode character ( char ) but a String, you must use charAt(index) to get the Unicode character at position index .


2 Answers

Edit (2017-10-12):

@MechaLynx and @Kevin-Weber note that unescape() is deprecated from non-browser environments and does not exist in TypeScript. decodeURIComponent is a drop-in replacement. For broader compatibility, use the below instead:

decodeURIComponent(JSON.parse('"http\\u00253A\\u00252F\\u00252Fexample.com"')); > 'http://example.com' 

Original answer:

unescape(JSON.parse('"http\\u00253A\\u00252F\\u00252Fexample.com"')); > 'http://example.com' 

You can offload all the work to JSON.parse

like image 121
radicand Avatar answered Sep 29 '22 09:09

radicand


UPDATE: Please note that this is a solution that should apply to older browsers or non-browser platforms, and is kept alive for instructional purposes. Please refer to @radicand 's answer below for a more up to date answer.


This is a unicode, escaped string. First the string was escaped, then encoded with unicode. To convert back to normal:

var x = "http\\u00253A\\u00252F\\u00252Fexample.com"; var r = /\\u([\d\w]{4})/gi; x = x.replace(r, function (match, grp) {     return String.fromCharCode(parseInt(grp, 16)); } ); console.log(x);  // http%3A%2F%2Fexample.com x = unescape(x); console.log(x);  // http://example.com 

To explain: I use a regular expression to look for \u0025. However, since I need only a part of this string for my replace operation, I use parentheses to isolate the part I'm going to reuse, 0025. This isolated part is called a group.

The gi part at the end of the expression denotes it should match all instances in the string, not just the first one, and that the matching should be case insensitive. This might look unnecessary given the example, but it adds versatility.

Now, to convert from one string to the next, I need to execute some steps on each group of each match, and I can't do that by simply transforming the string. Helpfully, the String.replace operation can accept a function, which will be executed for each match. The return of that function will replace the match itself in the string.

I use the second parameter this function accepts, which is the group I need to use, and transform it to the equivalent utf-8 sequence, then use the built - in unescape function to decode the string to its proper form.

like image 40
Ioannis Karadimas Avatar answered Sep 29 '22 09:09

Ioannis Karadimas