I have written a python script to validate url connectivity from a host. What is reporting successful (http 200) in linux <code>curl</code> is reported as a 403 in the python (3.6) <code>requests</code> module. I'm hoping someone can help me understand the differences here in reported http status codes? Curl from the Linux command line.... <pre class="prettyprint"><code>$ curl -ILs https://www.h2o.ai|egrep ^HTTP HTTP/1.1 200 OK </code></pre> Python requests module..... <pre class="prettyprint"><code>>>> import requests >>> url = 'https://www.h2o.ai' >>> r = requests.get(url, verify=True, timeout=3) >>> r.status_code 403 >>> requests.packages.urllib3.disable_warnings() >>> r = requests.get(url, verify=False, timeout=3) >>> r.status_code 403 </code></pre>

It seems the <code>python-requests/<version></code> <code>User-Agent</code>is being served the 403 response from the site: <pre class="prettyprint"><code>In [98]: requests.head('https://www.h2o.ai', headers={'User-Agent': 'Foo bar'}) Out[98]: <Response [200]> In [99]: requests.head('https://www.h2o.ai') Out[99]: <Response [403]> </code></pre> You can contact the site owner if you want or just use a different user-agent via the <code>User-Agent</code> header (like i used above). <hr> How did i debug this: I have run <code>curl</code> with <code>-v</code> (<code>--verbose</code>) option to check the headers being sent, and then checked out the same with <code>requests</code> using <code>response.request</code> (assuming the response is saved as <code>response</code>). I did not find any significant difference apart from the <code>User-Agent</code> header; hence, changing the <code>User-Agent</code> header worked as i expected.

Curl and Python Requests (get) reporting different http status code

Q: Does curl use HTTP?

cURL supports several different protocols, including HTTP and HTTPS, and runs on almost every platform.

Q: What HTTP method does curl use?

By default you use curl without explicitly saying which request method to use. If you just pass in a HTTP URL like curl http://example.com it will use GET. If you use -d or -F curl will use POST, -I will cause a HEAD and -T will make it a PUT.

Q: How do I use curl command in HTTP request?

To make a GET request using Curl, run the curl command followed by the target URL. Curl automatically selects the HTTP GET request method unless you use the -X, --request, or -d command-line option. In this Curl GET example, we send Curl requests to the ReqBin echo URL.

Tags:

python

curl

python-3.x

python-requests

I have written a python script to validate url connectivity from a host. What is reporting successful (http 200) in linux curl is reported as a 403 in the python (3.6) requests module.

I'm hoping someone can help me understand the differences here in reported http status codes?

Curl from the Linux command line....

$ curl -ILs https://www.h2o.ai|egrep ^HTTP
HTTP/1.1 200 OK

Python requests module.....

>>> import requests
>>> url = 'https://www.h2o.ai'
>>> r = requests.get(url, verify=True, timeout=3)
>>> r.status_code
403
>>> requests.packages.urllib3.disable_warnings()
>>> r = requests.get(url, verify=False, timeout=3)
>>> r.status_code
403

526

asked Jul 10 '18 14:07

user9074332

1 Answers

It seems the python-requests/<version> User-Agentis being served the 403 response from the site:

In [98]: requests.head('https://www.h2o.ai', headers={'User-Agent': 'Foo bar'})
Out[98]: <Response [200]>

In [99]: requests.head('https://www.h2o.ai')
Out[99]: <Response [403]>

You can contact the site owner if you want or just use a different user-agent via the User-Agent header (like i used above).

How did i debug this:

I have run curl with -v (--verbose) option to check the headers being sent, and then checked out the same with requests using response.request (assuming the response is saved as response).

I did not find any significant difference apart from the User-Agent header; hence, changing the User-Agent header worked as i expected.

answered Sep 30 '22 12:09

heemayl

Related questions
                            
                                cumsum() on multi-index pandas dataframe
                            
                                How is int.from_bytes() calculated?
                            
                                Tricky slicing specifications on business-day datetimeindex
                            
                                TypeError: Missing one required positional argument
                            
                                Slicing a MultiIndex DataFrame with a condition based on the index [duplicate]
                            
                                USBError: [Errno 13] Access denied (insufficient permissions)
                            
                                PyQt: is there an better way to set objectName in code?
                            
                                How to speed up pandas string function?
                            
                                Find number runs with customizable distance between numbers
                            
                                Get printable name of any QKeyEvent key value
                            
                                Plotly figure hide and display
                            
                                Error "'str' object is not callable" when using property setter
                            
                                How to {pivot|denormalize|manipulate} CSV table in Python
                            
                                Sum attributes of duplicate coordinates in python
                            
                                Altair: not sorting an axis
                            
                                How to melt first level column in multiindex with pandas
                            
                                pip install lxml fails on python 3.7 on windows
                            
                                what is uninitialized data in pytorch.empty function
                            
                                Pandas: seaborn countplot from several columns
                            
                                Numpy remove duplicate column values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With