Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

accept-charset="UTF-8" parameter doesnt do anything, when used in form

I am using accept-charset="utf-8" attribute in form and found that the when do a form post with non-ascii, the headers have different accept charset option in the request header. Is there anything i am missing ? My form looks like this

<form method="post" action="controller" accept-charset="UTF-8">
..input text box
.. submit button
</form>

Thanks in advance

like image 887
insomiac Avatar asked Oct 11 '12 00:10

insomiac


People also ask

What is accept charset in form?

The accept-charset attribute specifies the character encodings that are to be used for the form submission. The default value is the reserved string "UNKNOWN" (indicates that the encoding equals the encoding of the document containing the <form> element).

Does UTF-8 need meta charset?

Furthermore, most browsers use UTF-8 by default if no character encoding is specified. But because that's not guaranteed, it's better to just include a character encoding specification using the <meta> tag in your HTML file. There you have it.

What is a charset UTF-8?

UTF-8 is a character encoding system. It lets you represent characters as ASCII text, while still allowing for international characters, such as Chinese characters. As of the mid 2020s, UTF-8 is one of the most popular encoding systems.

What does UTF-8 do in HTML?

The HTML5 Standard: Unicode UTF-8 Unicode enables processing, storage, and transport of text independent of platform and language. The default character encoding in HTML-5 is UTF-8.


1 Answers

The question, as asked, is self-contradictory: the heading says that the accept-charset parameter does not do anything, whereas the question body says that when the accept-charset attribute (this is the correct term) is used, “the headers have different accept charset option in the request header”. I suppose a negation is missing from the latter statement.

Browsers send Accept-Charset parameters in HTTP request headers according to their own principles and settings. For example, my Chrome sends Accept-Charset:windows-1252,utf-8;q=0.7,*;q=0.3. Such a header is typically ignored by server-side software, but it could be used (and it was designed to be used) to determine which encoding is to be used in the server response, in case the server-side software (a form handler, in this case) is capable of using different encodings in the response.

The accept-charset attribute in a form element is not expected to affect HTTP request headers, and it does not. It is meant to specify the character encoding to be used for the form data in the request, and this is what it actually does. The HTML 4.01 spec is obscure about this, but the W3C HTML5 draft puts it much better, though for some odd reason uses plural: “gives the character encodings that are to be used for the submission”. I suppose the reason is that you could specify alternate encodings, to prepare for situations where a browser is unable to use your preferred encoding. And what actually happens in Chrome for example is that if you use accept-charset="foobar utt-8", then UTF-8 used.

In practice, the attribute is used to make the encoding of data submission different from the encoding of the page containing the form. Suppose your page is ISO-8859-1 encoded and someone types Greek or Hebrew letters into your form. Browsers will have to do some error recovery, since those characters cannot be represented in ISO-8859-1. (In practice they turn the characters to numeric character references, which is logically all wrong but pragmatically perhaps the best they can do.) Using <form charset=utf-8> helps here: no matter what the encoding is, the form data will be sent as UTF-8 encoding, which can handle any character.

If you wish to tell the form handler which encoding it should use in its response, then you can add a hidden (or non-hidden) field into the form for that.

like image 84
Jukka K. Korpela Avatar answered Oct 22 '22 07:10

Jukka K. Korpela