Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Curl and Python Requests (get) reporting different http status code

I have written a python script to validate url connectivity from a host. What is reporting successful (http 200) in linux curl is reported as a 403 in the python (3.6) requests module.

I'm hoping someone can help me understand the differences here in reported http status codes?

Curl from the Linux command line....

$ curl -ILs https://www.h2o.ai|egrep ^HTTP
HTTP/1.1 200 OK

Python requests module.....

>>> import requests
>>> url = 'https://www.h2o.ai'
>>> r = requests.get(url, verify=True, timeout=3)
>>> r.status_code
403
>>> requests.packages.urllib3.disable_warnings()
>>> r = requests.get(url, verify=False, timeout=3)
>>> r.status_code
403
like image 526
user9074332 Avatar asked Jul 10 '18 14:07

user9074332


People also ask

How do I get HTTP status code from curl response?

$ (curl --silent http://www.example.org -o >(cat >&1) -w "%{http_code}" 1>&2) 1>/dev/null 200 $ (curl --silent http://www.example.org -o >(cat >&1) -w "%{http_code}" 1>&2) 2>/dev/null <!

Does curl use HTTP?

cURL supports several different protocols, including HTTP and HTTPS, and runs on almost every platform.

What HTTP method does curl use?

By default you use curl without explicitly saying which request method to use. If you just pass in a HTTP URL like curl http://example.com it will use GET. If you use -d or -F curl will use POST, -I will cause a HEAD and -T will make it a PUT.

How do I use curl command in HTTP request?

To make a GET request using Curl, run the curl command followed by the target URL. Curl automatically selects the HTTP GET request method unless you use the -X, --request, or -d command-line option. In this Curl GET example, we send Curl requests to the ReqBin echo URL.


1 Answers

It seems the python-requests/<version> User-Agentis being served the 403 response from the site:

In [98]: requests.head('https://www.h2o.ai', headers={'User-Agent': 'Foo bar'})
Out[98]: <Response [200]>

In [99]: requests.head('https://www.h2o.ai')
Out[99]: <Response [403]>

You can contact the site owner if you want or just use a different user-agent via the User-Agent header (like i used above).


How did i debug this:

I have run curl with -v (--verbose) option to check the headers being sent, and then checked out the same with requests using response.request (assuming the response is saved as response).

I did not find any significant difference apart from the User-Agent header; hence, changing the User-Agent header worked as i expected.

like image 92
heemayl Avatar answered Sep 30 '22 12:09

heemayl