I am handling utf-8 strings in JavaScript and need to escape them.
Both escape() / unescape() and encodeURI() / decodeURI() work in my browser.
escape()
> var hello = "안녕하세요" > var hello_escaped = escape(hello) > hello_escaped "%uC548%uB155%uD558%uC138%uC694" > var hello_unescaped = unescape(hello_escaped) > hello_unescaped "안녕하세요"
encodeURI()
> var hello = "안녕하세요" > var hello_encoded = encodeURI(hello) > hello_encoded "%EC%95%88%EB%85%95%ED%95%98%EC%84%B8%EC%9A%94" > var hello_decoded = decodeURI(hello_encoded) > hello_decoded "안녕하세요"
However, Mozilla says that escape() is deprecated.
Although encodeURI() and decodeURI() work with the above utf-8 string, the docs (as well as the function names themselves) tell me that these methods are for URIs; I do not see utf-8 strings mentioned anywhere.
Simply put, is it okay to use encodeURI() and decodeURI() for utf-8 strings?
encodeURIComponent should be used to encode a URI Component - a string that is supposed to be part of a URL. encodeURI should be used to encode a URI or an existing URL.
The encodeURI() function encodes a URI by replacing each instance of certain characters by one, two, three, or four escape sequences representing the UTF-8 encoding of the character (will only be four escape sequences for characters composed of two "surrogate" characters).
The escape() function is deprecated. Use encodeURI() or encodeURIComponent() instead.
The difference between encodeURI and encodeURIComponent is encodeURIComponent encodes the entire string, where encodeURI ignores protocol prefix ('http://') and domain name. encodeURIComponent is designed to encode everything, where encodeURI ignores a URL's domain related roots.
Hi!
When it comes to escape
and unescape
, I live by two rules:
As mentioned in the question, both escape
and unescape
have been deprecated. In general, one should avoid using deprecated functions.
So, if encodeURIComponent
or encodeURI
does the trick for you, you should use that instead of escape
.
Browsers will, as far as possible, strive to achieve backwards compatibility. All major browsers have already implemented escape
and unescape
; why would they un-implement them?
Browsers would have to redefine escape
and unescape
if the new specification requires them to do so. But wait! The people who write specifications are quite smart. They too, are interested in not breaking backwards compatibility!
I realize that the above argument is weak. But trust me, ... when it comes to browsers, deprecated stuff works. This even includes deprecated HTML tags like <xmp>
and <center>
.
escape
and unescape
:So naturally, the next question is, when would one use escape
or unescape
?
Recently, while working on CloudBrave, I had to deal with utf8
, latin1
and inter-conversions.
After reading a bunch of blog posts, I realized how simple this was:
var utf8_to_latin1 = function (s) { return unescape(encodeURIComponent(s)); }; var latin1_to_utf8 = function (s) { return decodeURIComponent(escape(s)); };
These inter-conversions, without using escape
and unescape
are rather involved. By not avoiding escape
and unescape
, life becomes simpler.
Hope this helps.
It is never okay to use encodeURI()
or encodeURIComponent()
. Let's try it out:
console.log(encodeURIComponent('@#*'));
Input: @#*
. Output: %40%23*
. Wait, so, what exactly happened to the *
character? Why wasn't that converted? Imagine this: You ask a user what file to delete and their response is *
. Server-side, you convert that using encodeURIComponent()
and then run rm *
. Well, got news for you: using encodeURIComponent()
means you just deleted all files.
Use fixedEncodeURI()
, when trying to encode a complete URL (i.e., all of example.com?arg=val
), as defined and further explained at the MDN encodeURI() Documentation...
function fixedEncodeURI(str) { return encodeURI(str).replace(/%5B/g, '[').replace(/%5D/g, ']'); }
Or, you may need to use use fixedEncodeURIComponent()
, when trying to encode part of a URL (i.e., the arg
or the val
in example.com?arg=val
), as defined and further explained at the MDN encodeURIComponent() Documentation...
function fixedEncodeURIComponent(str) { return encodeURIComponent(str).replace(/[!'()*]/g, function(c) { return '%' + c.charCodeAt(0).toString(16); }); }
If you are unable to distinguish them based on the above description, I always like to simplify it with:
fixedEncodeURI()
: will not encode +@?=:#;,$&
to their http-encoded equivalents (as &
and +
are common URL operators)fixedEncodeURIComponent()
will encode +@?=:#;,$&
to their http-encoded equivalents.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With