Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python urllib2.HTTPError: HTTP Error 503: Service Unavailable on valid website

Tags:

I have been using Amazon's Product Advertising API to generate urls that contains prices for a given book. One url that I have generated is the following:

http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327

When I click on the link or paste the link on the address bar, the web page loads fine. However, when I execute the following code I get an error:

url = "http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327" html_contents = urllib2.urlopen(url) 

The error is urllib2.HTTPError: HTTP Error 503: Service Unavailable. First of all, I don't understand why I even get this error since the web page successfully loads.

Also, another weird behavior that I have noticed is that the following code sometimes does and sometimes does not give the stated error:

html_contents = urllib2.urlopen("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327") 

I am totally lost on how this behavior occurs. Is there any fix or work around to this? My goal is to read the html contents of the url.

EDIT

I don't know why stack overflow is changing my code to change the amazon link I listed above in my code to rads.stackoverflow. Anyway, ignore the rads.stackoverflow link and use my link above between the quotes.

like image 967
ruthless Avatar asked Sep 19 '14 14:09

ruthless


People also ask

What does this mean HTTP Error 503 the service is unavailable?

The HyperText Transfer Protocol (HTTP) 503 Service Unavailable server error response code indicates that the server is not ready to handle the request. Common causes are a server that is down for maintenance or that is overloaded.


1 Answers

Amazon is rejecting the default User-Agent for urllib2 . One workaround is to use the requests module

import requests page = requests.get("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327") html_contents = page.text 

If you insist on using urllib2, this is how a header can be faked to do it:

import urllib2 opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] response = opener.open('http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327') html_contents = response.read() 

Don't worry about stackoverflow editing the URL. They explain that they are doing this here.

like image 74
Spade Avatar answered Sep 20 '22 13:09

Spade