I am trying to figure out what I'm doing wrong here, but I keep getting lost...
In python 2.7, I'm running following code:
>>> import requests
>>> req = requests.request('GET', 'https://www.zomato.com/praha/caf%C3%A9-a-restaurant-z%C3%A1ti%C5%A1%C3%AD-kunratice-praha-4/daily-menu')
>>> req.content
'<html><body><h1>500 Server Error</h1>\nAn internal server error occured.\n</body></html>\n'
If I open this one in browser, it responds properly. I was digging around and found similar one with urllib library (500 error with urllib.request.urlopen), however I am not able to adapt it, even more I would like to use requests here.
I might be hitting here some missing proxy setting, as suggested for example here (Perl File::Fetch Failed HTTP response: 500 Internal Server Error), but can someone explain me, what is the proper workaround with this one?
One thing that is different with the browser request is the User-Agent; however you can alter it using requests like this:
url = 'https://www.zomato.com/praha/caf%C3%A9-a-restaurant-z%C3%A1ti%C5%A1%C3%AD-kunratice-praha-4/daily-menu'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.90 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.status_code) #should be 200
Some web applications will also check the Origin
and/or the Referer
headers (for example for AJAX requests); you can set these in a similar fashion to User-Agent
.
headers = {
'Origin': 'http://example.com',
'Referer': 'http://example.com/some_page'
}
Remember, you are setting these headers to basically bypass checks so please be a good netizen and don't abuse people's resources.
The above answers did help me on the path to resolution, but I had to find still more things to add to my headers so that certain sites would let me in using python requests. Learning how to use Wireshark (suggested above) was a good new skill for me, but I found an easier way.
If you go to your developer view (right-click then click Inspect in Chrome), then go to the Network tab, and then select one of the Names at left and then look under Headers for Requests Headers and expand, you'll get a complete list of what your system is sending to the server. I started adding elements that I thought were most likely needed one at a time and testing until my errors went away. Then I reduced that set to the smallest possible set that worked. In my case, with my headers having only User-Agent to deal with other code issues, I only needed to add the Accept-Language key to deal with a few other sites. See picture below as a guide to the text above.
I hope this process helps others to find ways to eliminate undesirable python requests return codes where possible.
The User-Agent, and also other header elements, could be causing your problem.
When I came accross this error I watched a regular request made by a browser using Wireshark, and it turned out there were things other than just the User-Agent in the header which the server expected to be there.
After emulating the header sent by the browser in python requests, the server stopped throwing errors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With