I'm trying to determine whether it is a bug that Python's urllib.urlopen() function omits an HTTP Accept header when making simple REST API requests.
The Facebook Graph API seems to notice whether the header is present or not:
GET /zuck HTTP/1.0
Host: graph.facebook.com
Accept: */*
Without the accept header, the returned content-type of application/json; charset=UTF-8
becomes text/javascript; charset=UTF-8
. That may be a bug in Facebook's REST API or it may be a legitimate response to a missing accept header.
I notice the command-line tools like curl use Accept: */*
by default:
$ curl -v https://graph.facebook.com/zuck
> GET /zuck HTTP/1.1
> User-Agent: curl/7.30.0
> Host: graph.facebook.com
> Accept: */*
Likewise, the Python requests package also uses Accept: */*
as a default:
def default_headers():
return CaseInsensitiveDict({
'User-Agent': default_user_agent(),
'Accept-Encoding': ', '.join(('gzip', 'deflate')),
'Accept': '*/*',
'Connection': 'keep-alive',
})
I presume that curl and requests add the default for a reason, but I'm not sure what that reason is.
RFC 2616 for HTTP/1.1 says that */* indicates all media types
and that if no Accept header field is present, then it is assumed that the client accepts all media types
. This would seem to indicate that Accept: */*
is optional and its omission would have no effect. That said, Python is using HTTP/1.0 and the RFCs are silent about the effect of omitting the header.
I would like to determine whether the best practice is to include Accept: */*
as curl and requests do or whether it is okay to omit is as Python's urllib.urlopen() does.
The question is important because I'm in a position to fix urllib.urlopen() if it is determined to be buggy or if it is problematic for use with REST APIs as commonly implemented using HTTP/1.0:
>>> import httplib
>>> httplib.HTTPConnection.debuglevel = 1
>>> import urllib
>>> u = urllib.urlopen('https://graph.facebook.com/zuck')
send: 'GET /zuck HTTP/1.0\r\nHost: graph.facebook.com\r\nUser-Agent: Python-urllib/1.17\r\n\r\n'
The related questions on StackOverflow aren't helpful for this question. What does 'Accept: */*' mean under Client section of Request Headers? asks what */*
means (we already know that it means all media types) and Send a curl request with no Accept header? asks how to omit the accept header in a curl request. My question focuses on whether you should include */*
and whether it is a bug to omit it.
the RFC states
The Accept request-header field can be used to specify certain media types which are acceptable for the response.
This means that the header is optional because it says can be used
.
as you pointed out ther RFC also says:
If no Accept header field is present, then it is assumed that the client accepts all media types.
This means that omitting the header SHOULD be equivalently interpreted by the server as sending Accept: */*
in the sense that the client acceptes all media types
in both cases.
It is interesting that the facebook response differs in both cases but I guess it is their failure of interpreting the protocol correctly. Though on the other side both responses are obviously correct responses to the request (Which I find a funny twist).
I have some general thoughts on this issue (which might also contribute to the bugfix discussion):
Be conservative in what you do, be liberal in what you accept from others (often reworded as "Be conservative in what you send, be liberal in what you accept").
you could decide to be more precise and explicitly add Accept: */*
. You would be more precise helping the server that he might have misinterpreted the protocol (like facebook probably did) that a missing header would be equivalent to Accept: */*
Accept: */*
which could be omitted increases network traffic by 11 Byte for every single request which might lead to performance issues. Having Accept: */*
be default in the request might make it hard for developers to get it out of the header in order to save to 11 Byte.When speaking HTTP/1.1: Even though (1) und (3) speak for fixing the urllib I would probably follow the specification and the performance argument (2) and omit the header. As stated above the response of facebook in both cases is correct since they are allowed to set the media type to whatever they like. (even though this behaviour seems unintended, weird, and by mistake)
When speaking HTTP/1.0: I would send the accept header since you said it is not specified in the HTTP/1.0 RFC and then I think Postel's law becomes more important. On the other side the Accept header is just optional in http 1.0. The Accept request-header field can be used to indicate a list of media ranges which are acceptable as a response to the request
Why would you set an optional header by default?
Reading-up about proxy servers (like NGinx and Varnish) helped me figure out what is going on.
While the presence of an Accept: */*
header shouldn't make a difference to a server, it can and likely will make a difference to a proxy server when the response includes a Vary: Accept
header. In particular, the proxy server is allowed to cache different results for different or omitted Accept headers.
Facebook has updated (and closed-off) its API since this question was asked, but at the time, here is the scenario that caused the observed effects. For backwards compatibility reasons, Facebook was using content negotiation and responding with text/javascript; charset=UTF-8
when getting the request that either omitted the Accept
header or had a browser-like Accept: text/html;text/*;*/*
. However, when it received Accept: */*
, it returned the more modern application/json; charset=UTF-8
. When a proxy server receives a request without an accept header, it can give either one of the cached responses; however, when it gets Accept: */*
, it always gives the last response.
So here is why you should include the Accept: */*
header: If you do, then a caching proxy will alway return the same content type. If omit the header, the response can vary depending on the results of the last user's content negotiation. REST API clients tend to rely on always getting the same content type back every time.
If a service reacts differently for Accept: */*
and absent Accept
, it is buggy (and you should send a bug report).
Furthermore, having a charset
parameter on application/json
is a bug as well; this is a media type that doesn’t have a charset
parameter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With