A simple HTML file:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<form method="POST" action="test.jsp" accept-charset="utf-8" method="post" enctype="application/x-www-form-urlencoded" >
<input type="text" name="P"/>
<input type="submit" value="subMit"/>
</form>
</body>
</html>
The HTML file is served by the server using header Content-Type:text/html; charset=utf-8
. Everything says: "dear browser when you post this form, please post it utf-8 encoded". The browser actually does this. Every value entered in the input field will be UTF-8 encoded. BUT the browser wont tell this to the server! The HTTP header of the post request will contain a Content-Type:application/x-www-form-urlencoded
field but the charset will be omitted (tested with FF3.6 and IE8).
The problem is the application server I use (Tomcat6) expects the charset in the Content-Type header (as stated in RFC2388). Like this: Content-Type:application/x-www-form-urlencoded;charset=utf-8
. If the charset is omitted it will assume ISO-8859-1 which is not the charset used for encoding. The result is broken data.
Does some one have a clue how to force the current browsers to append the charset to the Content-Type header?
The HTTP Accept-Charset is a request type header. This header is used to indicate what character set are acceptable for the response from the server. The accept-charset header specifies the character encodings which are accepted by the client and this header also allows a user-agent to specify the charsets it supports.
Content-Type in HTML forms In a POST request, resulting from an HTML form submission, the Content-Type of the request is specified by the enctype attribute on the <form> element.
It is the default Content- Type. A "charset" parameter may be used to indicate the character set of the body text. The primary subtype of text is "plain". This indicates plain (unformatted) text. The default Content-Type for Internet mail is "text/plain; charset=us-ascii".
Does some one have a clue how to force the current browsers to append the charset to the Content-Type header?
No, no browser has ever supplied a charset
parameter with the application/x-www-form-urlencoded
media type. What's more, the HTML spec which defines that type, does not propose a charset
parameter, so the server can't reasonably expect to get one.
(HTML4 does expect a charset
for the subparts of a multipart/form-data
submission, but even in that case no browser actually complies.)
accept-charset="utf-8"
accept-charset
is broken in IE, and shouldn't be used. It won't make a difference either way for forms in pages served as UTF-8, but in other cases it can end up with inconsistent results.
No, with forms you just have to serve the page they're in as UTF-8, and the results should come back as UTF-8 (with no identifying marks to tell you that (except potentially for the _charset_
hack, but Tomcat doesn't support that).
So you have to tell the Servlet container what encoding to use for parameters if you don't want it to fall back to its default (which is usually wrong). In a limited set of circumstances you may be able to call ServletRequest.setCharacterEncoding()
to do this, but this tends to be brittle, and doesn't work at all for parameters taken from the query string. There's not a standardised Servlet-level fix for this, sadly. For Tomcat you usually have to muck about with the server.xml instead of being able to fix it in the app.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With