Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python requests http response 500 (site can be reached in browser)

I am trying to figure out what I'm doing wrong here, but I keep getting lost...

In python 2.7, I'm running following code:

>>> import requests
>>> req = requests.request('GET', 'https://www.zomato.com/praha/caf%C3%A9-a-restaurant-z%C3%A1ti%C5%A1%C3%AD-kunratice-praha-4/daily-menu')
>>> req.content
'<html><body><h1>500 Server Error</h1>\nAn internal server error occured.\n</body></html>\n'

If I open this one in browser, it responds properly. I was digging around and found similar one with urllib library (500 error with urllib.request.urlopen), however I am not able to adapt it, even more I would like to use requests here.

I might be hitting here some missing proxy setting, as suggested for example here (Perl File::Fetch Failed HTTP response: 500 Internal Server Error), but can someone explain me, what is the proper workaround with this one?

like image 276
Kube Kubow Avatar asked Nov 05 '16 19:11

Kube Kubow


3 Answers

One thing that is different with the browser request is the User-Agent; however you can alter it using requests like this:

url = 'https://www.zomato.com/praha/caf%C3%A9-a-restaurant-z%C3%A1ti%C5%A1%C3%AD-kunratice-praha-4/daily-menu'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.90 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.status_code) #should be 200

Edit

Some web applications will also check the Origin and/or the Referer headers (for example for AJAX requests); you can set these in a similar fashion to User-Agent.

headers = {
    'Origin': 'http://example.com',
    'Referer': 'http://example.com/some_page'
}

Remember, you are setting these headers to basically bypass checks so please be a good netizen and don't abuse people's resources.

like image 140
Ionut Ticus Avatar answered Oct 27 '22 12:10

Ionut Ticus


But Wait! There's More!

The above answers did help me on the path to resolution, but I had to find still more things to add to my headers so that certain sites would let me in using python requests. Learning how to use Wireshark (suggested above) was a good new skill for me, but I found an easier way.

If you go to your developer view (right-click then click Inspect in Chrome), then go to the Network tab, and then select one of the Names at left and then look under Headers for Requests Headers and expand, you'll get a complete list of what your system is sending to the server. I started adding elements that I thought were most likely needed one at a time and testing until my errors went away. Then I reduced that set to the smallest possible set that worked. In my case, with my headers having only User-Agent to deal with other code issues, I only needed to add the Accept-Language key to deal with a few other sites. See picture below as a guide to the text above.

I hope this process helps others to find ways to eliminate undesirable python requests return codes where possible.

Screen Shot of my Developer/Inspect Window in Chrome

like image 45
Thom Ives Avatar answered Oct 27 '22 12:10

Thom Ives


The User-Agent, and also other header elements, could be causing your problem.

When I came accross this error I watched a regular request made by a browser using Wireshark, and it turned out there were things other than just the User-Agent in the header which the server expected to be there.

After emulating the header sent by the browser in python requests, the server stopped throwing errors.

like image 2
CALL_ME_BETTY_BOB Avatar answered Oct 27 '22 13:10

CALL_ME_BETTY_BOB