I had recently a problem with encoding of websites generated by servlet, that occurred if the servlets were deployed under Tomcat, but not under Jetty. I did a little bit of research about it and simplified the problem to the following servlet:
public class TestServlet extends HttpServlet implements Servlet {
@Override
public void service(HttpServletRequest request, HttpServletResponse response) throws IOException {
response.setContentType("text/plain");
Writer output = response.getWriter();
output.write("öäüÖÄÜß");
output.flush();
output.close();
}
}
If I deploy this under Jetty and direct the browser to it, it returns the expected result. The data is returned as ISO-8859-1 and if I take a look into the headers, then Jetty returns:
Content-Type: text/plain; charset=iso-8859-1
The browser detects the encoding from this header. If I deploy the same servlet in Tomcat, the browser shows up strange characters. But Tomcat also returns the data as ISO-8859-1, the difference is, that no header tells about it. So the browser has to guess the encoding, and that goes wrong.
My question is, is that behaviour of Tomcat correct or a bug? And if it is correct, how can I avoid this problem? Sure, I can always add response.setCharacterEncoding("UTF-8");
to the servlet, but that means I set a fixed encoding, that the browser might or might not understand. The problem is more relevant, if no browser but another service accesses the servlet. So how I should deal with the problem in the most flexible way?
Java Servlets.resource. setContentType ("text/html;charset=utf-8");
If you don't specify an encoding, the Servlet specification requires ISO-8859-1. However, AFAIK it does not require the container to set the encoding in the content type, at least not if you set it to "text/plain". This is what the spec says:
Calls to setContentType set the character encoding only if the given content type string provides a value for the charset attribute.
In other words, only if you set the content type like this
response.setContentType("text/plain; charset=XXXX")
Tomcat is required to set the charset. I haven't tried whether this works though.
In general, I would recommend to always set the encoding to UTF-8 (as it causes the least amount of trouble, at least in browsers) and then, for text/plain, state the encoding explicitly, to prevent browsers from using a system default.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With