Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

decodeURIComponent vs unescape, what is wrong with unescape?

In answering another question I became aware that my Javascript/DOM knowledge had become a bit out of date in that I am still using escape/unescape to encode the contents of URL components whereas it appears I should now be using encodeURIComponent/decodeURIComponent instead.

What I want to know is what is wrong with escape/unescape ? There are some vague suggestions that there is some sort of problem around Unicode characters, but I can't find any definite explanation.

My web experience is fairly biased, almost all of it has been writing big Intranet apps tied to Internet Explorer. That has involved a lot of use of escape/unescape and the apps involved have fully supported Unicode for many years now.

So what are the Unicode problems that escape/unescape are supposed to have ? Does anyone have any test cases to demonstrate the problems ?

like image 366
andynormancx Avatar asked Mar 06 '09 15:03

andynormancx


People also ask

What is difference between decodeURI and decodeURIComponent?

decodeURI(): It takes encodeURI(url) string as parameter and returns the decoded string. decodeURIComponent(): It takes encodeURIComponent(url) string as parameter and returns the decoded string.

What is Unescape character?

The unescape() function computes a new string in which hexadecimal escape sequences are replaced with the character that it represents.

How do you decode Unescape?

In JavaScript, to decode a string unescape() method is used. This method takes a string, which is encoded by escape() method, and decodes it. The hexadecimal characters in a string will be replaced by the actual characters they represent using unescape() method.

What is Unescape?

The unescape function is used in JavaScript to decode a string encoded using the encode function, or to decode other types of encoded strings, such as URLs. For example, the JavaScript below will encode and then decode a string.


2 Answers

What I want to know is what is wrong with escape/unescape ?

They're not “wrong” as such, they're just their own special string format which looks a bit like URI-parameter-encoding but actually isn't. In particular:

  • ‘+’ means plus, not space
  • there is a special “%uNNNN” format for encoding Unicode UTF-16 code points, instead of encoding UTF-8 bytes

So if you use escape() to create URI parameter values you will get the wrong results for strings containing a plus, or any non-ASCII characters.

escape() could be used as an internal JavaScript-only encoding scheme, for example to escape cookie values. However now that all browsers support encodeURIComponent (which wasn't originally the case), there's no reason to use escape in preference to that.

There is only one modern use for escape/unescape that I know of, and that's as a quick way to implement a UTF-8 encoder/decoder, by leveraging the UTF-8 processing in URIComponent handling:

utf8bytes= unescape(encodeURIComponent(unicodecharacters)); unicodecharacters= decodeURIComponent(escape(utf8bytes)); 
like image 123
bobince Avatar answered Sep 21 '22 13:09

bobince


escape operates only on characters in the range 0 to 255 inclusive (ISO-8859-1, which is effectively unicode code points representable with a single byte). (*)

encodeURIComponent works for all strings javascript can represent (which is the whole range of unicode's basic multilingual plane, i e unicode code points 0 to 1,114,111 or 0x10FFFF that cover almost any human writing system in current use).

Both functions produce url safe strings that only use code points 0 to 127 inclusive (US-ASCII), which the latter accomplishes by first encoding the string as UTF-8 and then applying the %XX hex encoding familiar from escape, to any code point that would not be url safe.

This is incidentally why you can make a two-funcall UTF-8 encoder/decoder in javascript without any loops or garbage generation, by combining these primitives to cancel out all but the UTF-8-processing side effects, as the unescape and decodeURIComponent versions do the same in reverse.

(*) Foot note: Some modern browsers like Google Chrome have been tweaked to produce %uXXXX for the above-255 range of characters escape wasn't originally defined for, but web server support for decoding that encoding is not as well-implemented as decoding the IETF-standardized UTF-8 based encoding.

like image 26
ecmanaut Avatar answered Sep 25 '22 13:09

ecmanaut