Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect if a string is encoded with escape() or encodeURIComponent()

I have a web service that receives data from various clients. Some of them sends the data encoded using escape(), while the others instead use encodeURIComponent(). Is there a way to detect the encoding used to escape the data?

like image 645
Rodrigo Avatar asked Aug 14 '09 03:08

Rodrigo


People also ask

How do you check if a string is encoded or not?

So you can test if the string contains a colon, if not, urldecode it, and if that string contains a colon, the original string was url encoded, if not, check if the strings are different and if so, urldecode again and if not, it is not a valid URI. You can make this loop simpler if you know what schemes you can expect.

Should I use encodeURI or encodeURIComponent?

encodeURIComponent should be used to encode a URI Component - a string that is supposed to be part of a URL. encodeURI should be used to encode a URI or an existing URL.

What characters does encodeURIComponent encode?

The encodeURIComponent() function encodes a URI by replacing each instance of certain characters by one, two, three, or four escape sequences representing the UTF-8 encoding of the character (will only be four escape sequences for characters composed of two "surrogate" characters).


4 Answers

This won't help in the server-side, but in the client-side I have used javascript exceptions to detect if the url encoding has produced ISO Latin or UTF8 encoding.

decodeURIComponent throws an exception on invalid UTF8 sequences.

try {
     result = decodeURIComponent(string);
}
catch (e) {
     result =  unescape(string);                                       
}

For example, ISO Latin encoded umlaut 'ä' %E4 will throw an exception in Firefox, but UTF8-encoded 'ä' %C3%A4 will not.

See Also

  • decodeURIComponent vs unescape, what is wrong with unescape?
  • Comparing escape(), encodeURI(), and encodeURIComponent()
like image 131
mika Avatar answered Nov 01 '22 06:11

mika


Encourage your clients to use encodeURIComponent(). See this page for an explanation: Comparing escape(), encodeURI(), and encodeURIComponent(). If you really want to try to figure out exactly how something was encoded, you can try to look for some of the characters that escape() and encodeURI() do not encode.

like image 28
Derek Swingley Avatar answered Nov 01 '22 08:11

Derek Swingley


Thanks for @mika for great answer. Maybe just one improvement since unescape function is considered as deprecated:

declare function unescape(s: string): string;


decodeURItoString(str): string {

 var resp = str;

 try {
    resp = decodeURI(str);
 } catch (e) {
    console.log('ERROR: Can not decodeURI string!');

    if ( (unescape != null) && (unescape instanceof Function) ) {
        resp = unescape(str);
    }
 }

return resp;

}

like image 38
Dudi Avatar answered Nov 01 '22 07:11

Dudi


I realize this is an old question, but I am unaware of a better solution. So I do it like this (thanks to a comment by RobertPitt above):

function isEncoded(str) {
    return typeof str == "string" && decodeURIComponent(str) !== str;
}

I have not yet encountered a case where this failed. Which doesn't mean that case doesn't exists. Maybe someone could shed some light on this.

like image 20
Dejan Janjušević Avatar answered Nov 01 '22 06:11

Dejan Janjušević