<p>The HTML 5 specification describes an algorithm for selecting the character encoding to be used in a multi-part form submission (e.g. UTF-8). However, it is not clear how the selected encoding should be relayed to the server so that the content can be properly decoded on the receiving end.</p> <p>Often, character encodings are represented by appending a "charset" parameter to the value of the <code>Content-Type</code> request header. However, this parameter does not appear to be defined for the <code>multipart/form-data</code> MIME type:</p> <p>https://www.rfc-editor.org/rfc/rfc7578#section-8</p> <p>Each part in a multipart form submission may provide its own <code>Content-Type</code> header; however, RFC 7578 notes that "in practice, many widely deployed implementations do not supply a charset parameter in each part, but rather, they rely on the notion of a 'default charset' for a multipart/form-data instance".</p> <p>RFC 7578 goes on to suggest that a hidden "_charset_" form field can be used for this purpose. However, neither Safari (9.1) nor Chrome (51) appear to populate this field, nor do they provide any per-part encoding information.</p> <p>I've looked at the request headers produced by both browsers and I don't see any obvious character encoding information. Does anyone know how the browsers are conveying this information to the server?</p>

<p>HTML 5 uses RFC 2388 (obsoleted by RFC 7578), however HTML 5 <em>explicitly</em> removes the <code>Content-Type</code> header from non-file fields, while the RFCs do not:</p> <blockquote> <p>The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a <code>Content-Type</code> header specified. Their names and values must be encoded using the character encoding selected above (field names in particular do not get converted to a 7-bit safe encoding as suggested in RFC 2388).</p> </blockquote> <p>The RFCs are designed to allow <code>multipart/form-data</code> to be usable in other contexts besides just HTML (though that is its most common use). In those other contexts, <code>Content-Type</code> is allowed. Just not in HTML 5 (but is allowed in HTML 4).</p> <p>Without a <code>Content-Type</code> header, the hidden <code>_charset_</code> form field, if present, is the only way an HTML 5 <code><form></code> submitter can <em>explicitly</em> state which charset is used.</p> <p>Per the HTML 5 algorithm spec that you linked to, the chosen charset MUST be selected from the <code><form></code> element's <code>accept-charset</code> attribute if present, otherwise be the charset used by the HTML itself if it is ASCII-compatible, otherwise be UTF-8. This is explicitly stated in the algorithm spec, as well as in RFC 7578 Section 5.1.2 when referring to HTML 5.</p> <p>So, there really is no need for the charset to be <em>explicitly</em> stated by a web browser since the receiver of the form submission should know which charset(s) to expect by virtue of how the <code><form></code> was created, and thus can check for those charset(s) while parsing the submission. If the receiver wants to know the <em>specific</em> charset used, it needs to include a hidden <code>_charset_</code> field in the <code><form></code>.</p>

How is character encoding specified in a multipart/form-data HTTP POST request?

Tags:

html

post

multipartform-data

utf-8

The HTML 5 specification describes an algorithm for selecting the character encoding to be used in a multi-part form submission (e.g. UTF-8). However, it is not clear how the selected encoding should be relayed to the server so that the content can be properly decoded on the receiving end.

Often, character encodings are represented by appending a "charset" parameter to the value of the Content-Type request header. However, this parameter does not appear to be defined for the multipart/form-data MIME type:

https://www.rfc-editor.org/rfc/rfc7578#section-8

Each part in a multipart form submission may provide its own Content-Type header; however, RFC 7578 notes that "in practice, many widely deployed implementations do not supply a charset parameter in each part, but rather, they rely on the notion of a 'default charset' for a multipart/form-data instance".

RFC 7578 goes on to suggest that a hidden "_charset_" form field can be used for this purpose. However, neither Safari (9.1) nor Chrome (51) appear to populate this field, nor do they provide any per-part encoding information.

I've looked at the request headers produced by both browsers and I don't see any obvious character encoding information. Does anyone know how the browsers are conveying this information to the server?

215

asked Jun 23 '16 18:06

Greg Brown

1 Answers

HTML 5 uses RFC 2388 (obsoleted by RFC 7578), however HTML 5 explicitly removes the Content-Type header from non-file fields, while the RFCs do not:

The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a Content-Type header specified. Their names and values must be encoded using the character encoding selected above (field names in particular do not get converted to a 7-bit safe encoding as suggested in RFC 2388).

The RFCs are designed to allow multipart/form-data to be usable in other contexts besides just HTML (though that is its most common use). In those other contexts, Content-Type is allowed. Just not in HTML 5 (but is allowed in HTML 4).

Without a Content-Type header, the hidden _charset_ form field, if present, is the only way an HTML 5 <form> submitter can explicitly state which charset is used.

Per the HTML 5 algorithm spec that you linked to, the chosen charset MUST be selected from the <form> element's accept-charset attribute if present, otherwise be the charset used by the HTML itself if it is ASCII-compatible, otherwise be UTF-8. This is explicitly stated in the algorithm spec, as well as in RFC 7578 Section 5.1.2 when referring to HTML 5.

So, there really is no need for the charset to be explicitly stated by a web browser since the receiver of the form submission should know which charset(s) to expect by virtue of how the <form> was created, and thus can check for those charset(s) while parsing the submission. If the receiver wants to know the specific charset used, it needs to include a hidden _charset_ field in the <form>.

169

answered Sep 23 '22 06:09

Remy Lebeau

Related questions
                            
                                Title of history.pushState is unsupported, what's a good alternative?
                            
                                Why are smart app banners partially/completely hidden in Safari when I have viewport specified?
                            
                                CSS floating annotations with their own flow
                            
                                Chrome Dev Tool track DOM changes on interaction
                            
                                Disable special "class" attribute handling
                            
                                HTML5 Sliders Disappear Under Chrome's Device Mode
                            
                                3D Transform z-index broken with firefox, preserve-3d not preserved
                            
                                css center default value in select dropdown menu
                            
                                Why is table-layout: fixed affecting the width of the parent element?
                            
                                Why is <br> an element, not an entity? [duplicate]
                            
                                Using pre tag inside td
                            
                                Avoid delayed load of font-face
                            
                                Why doesn't Bootstrap button dropdown work on iOS?
                            
                                How to autosave HTML5 search input through javascript / without refresh
                            
                                Default sort column in Wikipedia table
                            
                                HTML5 section tag meanings?
                            
                                Should a click handler be run when clicking on a scrollbar
                            
                                How to edit WordPress page's html code
                            
                                Set CSS transition to use speed instead of duration?
                            
                                Keeping equal number of flex children per line when they wrap

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With