Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to force browser to set charset in content-type http header

A simple HTML file:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<form method="POST" action="test.jsp" accept-charset="utf-8" method="post" enctype="application/x-www-form-urlencoded" >
    <input type="text" name="P"/>
    <input type="submit" value="subMit"/>
</form>
</body>
</html>

The HTML file is served by the server using header Content-Type:text/html; charset=utf-8. Everything says: "dear browser when you post this form, please post it utf-8 encoded". The browser actually does this. Every value entered in the input field will be UTF-8 encoded. BUT the browser wont tell this to the server! The HTTP header of the post request will contain a Content-Type:application/x-www-form-urlencoded field but the charset will be omitted (tested with FF3.6 and IE8).

The problem is the application server I use (Tomcat6) expects the charset in the Content-Type header (as stated in RFC2388). Like this: Content-Type:application/x-www-form-urlencoded;charset=utf-8. If the charset is omitted it will assume ISO-8859-1 which is not the charset used for encoding. The result is broken data.

Does some one have a clue how to force the current browsers to append the charset to the Content-Type header?

like image 779
Eduard Wirch Avatar asked Mar 10 '10 17:03

Eduard Wirch


People also ask

How do I set character encoding in HTTP header?

The HTTP Accept-Charset is a request type header. This header is used to indicate what character set are acceptable for the response from the server. The accept-charset header specifies the character encodings which are accepted by the client and this header also allows a user-agent to specify the charsets it supports.

How do you specify Content-Type in HTML?

Content-Type in HTML forms In a POST request, resulting from an HTML form submission, the Content-Type of the request is specified by the enctype attribute on the <form> element.

What is Content-Type charset?

It is the default Content- Type. A "charset" parameter may be used to indicate the character set of the body text. The primary subtype of text is "plain". This indicates plain (unformatted) text. The default Content-Type for Internet mail is "text/plain; charset=us-ascii".


1 Answers

Does some one have a clue how to force the current browsers to append the charset to the Content-Type header?

No, no browser has ever supplied a charset parameter with the application/x-www-form-urlencoded media type. What's more, the HTML spec which defines that type, does not propose a charset parameter, so the server can't reasonably expect to get one.

(HTML4 does expect a charset for the subparts of a multipart/form-data submission, but even in that case no browser actually complies.)

accept-charset="utf-8"

accept-charset is broken in IE, and shouldn't be used. It won't make a difference either way for forms in pages served as UTF-8, but in other cases it can end up with inconsistent results.

No, with forms you just have to serve the page they're in as UTF-8, and the results should come back as UTF-8 (with no identifying marks to tell you that (except potentially for the _charset_ hack, but Tomcat doesn't support that).

So you have to tell the Servlet container what encoding to use for parameters if you don't want it to fall back to its default (which is usually wrong). In a limited set of circumstances you may be able to call ServletRequest.setCharacterEncoding() to do this, but this tends to be brittle, and doesn't work at all for parameters taken from the query string. There's not a standardised Servlet-level fix for this, sadly. For Tomcat you usually have to muck about with the server.xml instead of being able to fix it in the app.

like image 80
bobince Avatar answered Sep 18 '22 12:09

bobince