I have written a python script to validate url connectivity from a host. What is reporting successful (http 200) in linux curl
is reported as a 403 in the python (3.6) requests
module.
I'm hoping someone can help me understand the differences here in reported http status codes?
Curl from the Linux command line....
$ curl -ILs https://www.h2o.ai|egrep ^HTTP
HTTP/1.1 200 OK
Python requests module.....
>>> import requests
>>> url = 'https://www.h2o.ai'
>>> r = requests.get(url, verify=True, timeout=3)
>>> r.status_code
403
>>> requests.packages.urllib3.disable_warnings()
>>> r = requests.get(url, verify=False, timeout=3)
>>> r.status_code
403
$ (curl --silent http://www.example.org -o >(cat >&1) -w "%{http_code}" 1>&2) 1>/dev/null 200 $ (curl --silent http://www.example.org -o >(cat >&1) -w "%{http_code}" 1>&2) 2>/dev/null <!
cURL supports several different protocols, including HTTP and HTTPS, and runs on almost every platform.
By default you use curl without explicitly saying which request method to use. If you just pass in a HTTP URL like curl http://example.com it will use GET. If you use -d or -F curl will use POST, -I will cause a HEAD and -T will make it a PUT.
To make a GET request using Curl, run the curl command followed by the target URL. Curl automatically selects the HTTP GET request method unless you use the -X, --request, or -d command-line option. In this Curl GET example, we send Curl requests to the ReqBin echo URL.
It seems the python-requests/<version>
User-Agent
is being served the 403 response from the site:
In [98]: requests.head('https://www.h2o.ai', headers={'User-Agent': 'Foo bar'})
Out[98]: <Response [200]>
In [99]: requests.head('https://www.h2o.ai')
Out[99]: <Response [403]>
You can contact the site owner if you want or just use a different user-agent via the User-Agent
header (like i used above).
How did i debug this:
I have run curl
with -v
(--verbose
) option to check the headers being sent, and then checked out the same with requests
using response.request
(assuming the response is saved as response
).
I did not find any significant difference apart from the User-Agent
header; hence, changing the User-Agent
header worked as i expected.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With