I am having an issue URL decoding a UTF-8 string in Java that is encoded either with Javascript or Actionscript 3. I've set up a test case as follows:
The string in question is Produktgröße
When I encode with JS/AS3 I get the following string:
escape('Produktgröße')
Produktgr%F6%DFe
When I unescape this with JS I get no change
unescape('Produktgr%F6%DFe')
Produktgr%F6%DFe
So, by this I assume that JS isn't encoding the string properly??
The following JSP produces this outupt
<%@page import="java.net.URLEncoder"%>
<%@page import="java.net.URLDecoder"%>
<%=(URLDecoder.decode("Produktgr%F6%DFe","UTF-8"))%><br/>
<%=(URLEncoder.encode("Produktgröße","UTF-8"))%><br/>
<%=(URLEncoder.encode("Produktgröße"))%><br/>
<%=(URLDecoder.decode(URLEncoder.encode("Produktgröße")))%><br/>
<%=(URLDecoder.decode(URLEncoder.encode("Produktgröße"),"UTF-8"))%><br/>
Produktgr?e
Produktgr%C3%B6%C3%9Fe
Produktgr%C3%B6%C3%9Fe
Produktgröße
Produktgröße
Any idea why I'm having this disparity with the languages and why JS/AS3 isn't behaving as I expect it to?
Thanks.
The native character encoding of the Java programming language is UTF-16.
String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8. Errors may occur when converting between differently coded character data. There are two general types of encoding errors.
public class URLDecoder extends Object. Utility class for HTML form decoding. This class contains static methods for decoding a String from the application/x-www-form-urlencoded MIME format. The conversion process is the reverse of that used by the URLEncoder class.
Decoding in Javascript can be achieved using decodeURI function. It takes encodeURIComponent(url) string so it can decode these characters. 2. unescape() function: This function takes a string as a single parameter and uses it to decode that string encoded by the escape() function.
escape is a deprecated function and does not correctly encode Unicode characters. Use encodeURI or encodeURIComponent, the latter probably being the method most suitable for your needs.
Javascript is URL encoding your string using Latin-1 charset. Java is URL encoding it using UTF-8.
The URL encoding is really just replacing the characters/bytes that it doesn't recognise. For example, even if you were to stick with ASCII characters, (
would be encoded as %28
. You have the additional problem of character sets when you start using non-ASCII characters (any thing longer than 7 bits).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With