An HTTP
request might have the Content-Type
header:
GET / HTTP/1.1
...
Content-Type: text/xml; charset=utf-8
...
Is there circumstances where the charset
component is mandatory? in case, when?
Example of possibles Content-Type
headers, not necessarily correct:
Content-Type: text/xml
Content-Type: charset=utf-8
Content-Type: text/xml; charset=utf8
Content-Type:
Standard info:
EDIT NOTE: It seem this reference is obsolete, RFC 7231 is the correct version now, as suggested by @RobbyCornelissen.
The Standard say rather little about this (or maybe I am looking in the wrong place): https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
14.17 Content-Type
The Content-Type entity-header field indicates the media type of the entity-body sent to the recipient or, in the case of the HEAD method, the media type that would have been sent had the request been a GET.
Content-Type = "Content-Type" ":" media-type
Media types are defined in section 3.7. An example of the field is
Content-Type: text/html; charset=ISO-8859-4
Further discussion of methods for identifying the media type of an entity is provided in section 7.2.1.
No, it's not mandatory. Per the HTTP 1.1 specification: Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type header field defining the media type of that body.
The charset parameter Documents transmitted with HTTP that are of type text, such as text/html, text/plain, etc., can send a charset parameter in the HTTP header to specify the character encoding of the document. It is very important to always label Web documents explicitly.
It doesn't matter which you use, but it's easier to type the first one. It also doesn't matter whether you type UTF-8 or utf-8 . You should always use the UTF-8 character encoding. (Remember that this means you also need to save your content as UTF-8.)
The Content-Type representation header is used to indicate the original media type of the resource (prior to any content encoding applied for sending). In responses, a Content-Type header provides the client with the actual content type of the returned content.
See RCF 7231, Appendix B. Changes from RFC 2616:
The default charset of ISO-8859-1 for text media types has been removed; the default is now whatever the media type definition says. Likewise, special treatment of ISO-8859-1 has been removed from the Accept-Charset header field. (Section 3.1.1.3 and Section 5.3.3)
So it depends on the default character set / encoding for the given media type. You can look up the media type registry with IANA, for example the application/xml media type, which links to RFC 7303 Section 3:
As many as three distinct sources of information about character encoding may be present for an XML MIME entity: a charset parameter, a BOM (see Section 3.3 below), and an XML encoding declaration (see Section 4.3.3 of [XML]). Ensuring consistency among these sources requires coordination between entity authors and MIME agents (that is, processes that package, transfer, deliver, and/or receive MIME entities).
The use of UTF-8, without a BOM, is RECOMMENDED for all XML MIME entities.
So no, it's not mandatory, but if omitted, it depends on the specific media type how you can detect it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With