I'm looking for some explanation on how the app engine deals with character encodings. I'm working on a client-server application where the server is on app engine.
This is a new application built from scratch, so we're using UTF-8 everywhere. The client sends some strings to the server through POST, x-www-form-urlencoded. I receive them and echo them back. When the client gets it back, it's ISO-8859-1! I also see this behavior when POSTing to the blobstore, with the parameters sent as UTF-8, multipart/form-data encoded.
For the record, I'm seeing this in Wireshark. So I'm 100% sure I send UTF-8 and receive ISO-8859-1. Also, I'm not seeing mojibake: the ISO-8859-1 encoded strings are perfectly fine. This is also not an issue of misinterpreting the Content-Type. It's not the client. Something along the way is correctly recognizing I'm sending UTF-8 parameters, but is converting them to ISO-8859-1 for some reason.
I'm led to believe ISO-8859-1 is the default character encoding for the GAE servlets. My question is, is there a way to tell GAE not to convert to ISO-8859-1 and instead use UTF-8 everywhere?
Let's say the servlet does something like this:
public void doPost(HttpServletRequest req, HttpServletResponse resp) throws IOException {
resp.setContentType("application/json");
String name = req.getParameter("name");
String json = "{\"name\":\"" + name + "\"}";
resp.getOutputStream().print(json);
}
I tried setting the character encoding of the response and request to "UTF-8", but that didn't change anything.
Thanks in advance,
UTF-8 Encoding in Notepad (Windows)Click File in the top-left corner of your screen. In the dialog which appears, select the following options: In the "Save as type" drop-down, select All Files. In the "Encoding" drop-down, select UTF-8.
The default character encoding for Android is UTF-8, as specified by the JavaDoc of the Charset.
The Difference Between Unicode and UTF-8Unicode is a character set. UTF-8 is encoding. Unicode is a list of characters with unique decimal numbers (code points).
UTF-16 is, obviously, more efficient for A) characters for which UTF-16 requires fewer bytes to encode than does UTF-8. UTF-8 is, obviously, more efficient for B) characters for which UTF-8 requires fewer bytes to encode than does UTF-16.
I see two things you should do.
1) set system-properties (if you are using it) to utf8 in your appengine-web.xml
<system-properties>
<property name="java.util.logging.config.file" value="WEB-INF/logging.properties" />
<property name="file.encoding" value="UTF-8" />
<property name="DEFAULT_ENCODING" value="UTF-8" />
</system-properties>
OK that above is what I have but the docs suggest this below:
<env-variables>
<env-var name="DEFAULT_ENCODING" value="UTF-8" />
</env-variables>
https://developers.google.com/appengine/docs/java/config/appconfig
2) specify the encoding when you set the content type or it will revert to the default
The content type may include the type of character encoding used, for example, text/html; charset=ISO-8859-4.
I'd try
resp.setContentType("application/json; charset=UTF-8");
You could also try a writer which lets you set the content type to it directly.
http://docs.oracle.com/javaee/1.3/api/javax/servlet/ServletResponse.html#getWriter%28%29
http://docs.oracle.com/javaee/1.3/api/javax/servlet/ServletResponse.html#setContentType(java.lang.String)
For what it's worth, I need utf8 for Japanese content and I have no trouble. I'm not using a filter or setContentType anyway. I am using gwt and #1 above and it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With