Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accept and Accept-Charset - Which is superior?

In HTTP you can specify in a request that your client can accept specific content in responses using the accept header, with values such as application/xml. The content type specification allows you to include parameters in the content type, such as charset=utf-8, indicating that you can accept content with a specified character set.

There is also the accept-charset header, which specifies the character encodings which are accepted by the client.

If both headers are specified and the accept header contains content types with the charset parameter, which should be considered the superior header by the server?

e.g.:

Accept: application/xml; q=1,
        text/plain; charset=ISO-8859-1; q=0.8
Accept-Charset: UTF-8

I've sent a few example requests to various servers using Fiddler to test how they respond:

Examples

W3

Request

GET http://www.w3.org/ HTTP/1.1
Host: www.w3.org
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1

Response

Content-Type: text/html; charset=utf-8

Google

Request

GET http://www.google.co.uk/ HTTP/1.1
Host: www.google.co.uk
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1

Response

Content-Type: text/html; charset=ISO-8859-1

StackOverflow

Request

GET http://stackoverflow.com/ HTTP/1.1
Host: stackoverflow.com
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1

Response

Content-Type: text/html; charset=utf-8

Microsoft

Request

GET http://www.microsoft.com/ HTTP/1.1
Host: www.microsoft.com
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1

Response

Content-Type: text/html

There doesn't seem to be any consensus around what the expected behaviour is. I am trying to look surprised.

like image 624
Paul Turner Avatar asked Aug 14 '11 08:08

Paul Turner


People also ask

Which of the following are the accepted values for Accept-charset?

The default value of the accept-charset attribute is “UNKNOWN” string which indicates the encoding equals to the encoding of the document containing the <form> element.

What is the difference between ISO 8859 1 and UTF-8?

UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.

What is the purpose of accept in HTTP?

The Accept request HTTP header indicates which content types, expressed as MIME types, the client is able to understand. The server uses content negotiation to select one of the proposals and informs the client of the choice with the Content-Type response header.

What UTF-8 means?

UTF-8 (UCS Transformation Format 8) is the World Wide Web's most common character encoding. Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character.


4 Answers

Altough you can set media type in Accept header, the charset parameter definition for that media type is not defined anywhere in RFC 2616 (but it is not forbidden, though).

Therefore if you are going to implement a HTTP 1.1 compliant server, you shall first look for Accept-charset header, and then search for your own parameters at Accept header.

like image 118
Paulo Avatar answered Oct 20 '22 05:10

Paulo


Read RFC 2616 Section 14.1 and 14.2. The Accept header does not allow you to specify a charset. You have to use the Accept-Charset header instead.

like image 45
Remy Lebeau Avatar answered Oct 20 '22 03:10

Remy Lebeau


Firstly, Accept headers can accept parameters, see RFC 7231 section 5.3.2

All text/* mime-types can accept a charset parameter.

The Accept-Charset header allows a user-agent to specify the charsets it supports.

If the Accept-Charset header did not exist, a user-agent would have to specify each charset parameter for each text/* media type it accepted, e.g.

Accept: text/html;charset=US-ASCII, text/html;charset=UTF-8, text/plain;charset=US-ASCII, text/plain;charset=UTF-8
like image 8
Malcolm Sparks Avatar answered Oct 20 '22 04:10

Malcolm Sparks


RFC 7231 section 5.3.2 (Accept) clearly states:

Each media-range might be followed by zero or more applicable media type parameters (e.g., charset)

So a charset parameter for each content-type is allowed. In theory a client could accept, for example, text/html only in UTF-8 and text/plain only in US-ASCII.

But it would usually make more sense to state possible charsets in the Accept-Charset header as that applies to all types mentioned in the Accept header.

If those headers’ charsets don’t overlap, the server could send status 406 Not Acceptable.

However, I wouldn’t expect fancy cross-matching from a server for various reasons. It would make the server code more complicated (and therefore more error-prone) while in practice a client would rarely send such requests. Also nowadays I would expect everything server-side is using UTF-8 and sent as-is so there’s nothing to negotiate.

like image 3
Martin Avatar answered Oct 20 '22 04:10

Martin