Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are character set names case-sensitive in HTTP?

This is a follow-up to Are HTTP headers case-sensitive?.

In the HTTP Content-Type header, I have seen character set names expressed both in upper- and lower-case form. For example, for the UTF-8 character set:

Content-Type: text/html; charset=UTF-8  Content-Type: text/html; charset=utf-8 

Here are some mixed-case variants (the latter two certainly not being likely in the real world):

Content-Type: text/html; charset=Utf-8  Content-Type: text/html; charset=UtF-8  Content-Type: text/html; charset=uTf-8 

Are all forms equally valid? Or, are the client and server applications that ignore the case of the character set name merely being flexible? Alternatively, are those applications that recognize only one representation non-compliant?

like image 531
DavidRR Avatar asked Oct 15 '13 21:10

DavidRR


People also ask

Is HTTP Content-Type case sensitive?

The type, subtype, and parameter names are not case sensitive.

Can HTTP headers be lowercase?

All HTTP headers are converted from ISO-8859-1 (the character set for HTTP headers as defined in the RFC publications) to UTF-8 in the metadata (and vice versa for requests). All HTTP header keys are converted to lowercase in both directions (since HTTP header keys are defined to be case-insensitive).

Are HTTP header fields case sensitive?

HTTP headers are case insensitive. To simplify your code, URL Loading System canonicalizes certain header field names into their standard form. For example, if the server sends a content-length header, it's automatically adjusted to be Content-Length .

Is meta charset case sensitive?

The value for charset is case-insensitive. The charset attribute specifies the character encoding used by the document. This is a character encoding declaration. If the attribute is present, its value must be an ASCII case-insensitive match for the string "utf-8".


1 Answers

[Here is the result of my research.]

RFC 2616 clause 3.4 says the following:

HTTP character sets are identified by case-insensitive tokens. The complete set of tokens is defined by the IANA Character Set registry [19].

charset = token 

The IANA Character Set registry is now maintained here. At the very top of this document under Note, the second paragraph reads:

The character set names may be up to 40 characters taken from the printable characters of US-ASCII. However, no distinction is made between use of upper and lower case letters.

Conclusion: These two references indicate that case does not matter when specifying a character set name.

like image 135
DavidRR Avatar answered Sep 23 '22 02:09

DavidRR