Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference in URL decode/encode UTF-8 between Java and JS/AS3 (bug!?)

I am having an issue URL decoding a UTF-8 string in Java that is encoded either with Javascript or Actionscript 3. I've set up a test case as follows:

The string in question is Produktgröße

When I encode with JS/AS3 I get the following string:

escape('Produktgröße')

Produktgr%F6%DFe

When I unescape this with JS I get no change

unescape('Produktgr%F6%DFe')

Produktgr%F6%DFe

So, by this I assume that JS isn't encoding the string properly??

The following JSP produces this outupt

<%@page import="java.net.URLEncoder"%>
<%@page import="java.net.URLDecoder"%>
<%=(URLDecoder.decode("Produktgr%F6%DFe","UTF-8"))%><br/>
<%=(URLEncoder.encode("Produktgröße","UTF-8"))%><br/>
<%=(URLEncoder.encode("Produktgröße"))%><br/>
<%=(URLDecoder.decode(URLEncoder.encode("Produktgröße")))%><br/>
<%=(URLDecoder.decode(URLEncoder.encode("Produktgröße"),"UTF-8"))%><br/>

Produktgr?e

Produktgr%C3%B6%C3%9Fe

Produktgr%C3%B6%C3%9Fe

Produktgröße

Produktgröße

Any idea why I'm having this disparity with the languages and why JS/AS3 isn't behaving as I expect it to?

Thanks.

like image 785
user710437 Avatar asked May 25 '11 22:05

user710437


People also ask

Does Java use UTF-8 or UTF-16?

The native character encoding of the Java programming language is UTF-16.

Is Java a UTF-8 String?

String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8. Errors may occur when converting between differently coded character data. There are two general types of encoding errors.

What is URL decoder in Java?

public class URLDecoder extends Object. Utility class for HTML form decoding. This class contains static methods for decoding a String from the application/x-www-form-urlencoded MIME format. The conversion process is the reverse of that used by the URLEncoder class.

How do you decode or encode a URL in Javascript?

Decoding in Javascript can be achieved using decodeURI function. It takes encodeURIComponent(url) string so it can decode these characters. 2. unescape() function: This function takes a string as a single parameter and uses it to decode that string encoded by the escape() function.


2 Answers

escape is a deprecated function and does not correctly encode Unicode characters. Use encodeURI or encodeURIComponent, the latter probably being the method most suitable for your needs.

like image 99
Andy E Avatar answered Nov 14 '22 22:11

Andy E


Javascript is URL encoding your string using Latin-1 charset. Java is URL encoding it using UTF-8.

The URL encoding is really just replacing the characters/bytes that it doesn't recognise. For example, even if you were to stick with ASCII characters, ( would be encoded as %28. You have the additional problem of character sets when you start using non-ASCII characters (any thing longer than 7 bits).

like image 22
Codebling Avatar answered Nov 14 '22 22:11

Codebling