I have been using Amazon's Product Advertising API to generate urls that contains prices for a given book. One url that I have generated is the following: http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327 When I click on the link or paste the link on the address bar, the web page loads fine. However, when I execute the following code I get an error: <pre class="prettyprint"><code>url = "http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327" html_contents = urllib2.urlopen(url) </code></pre> The error is urllib2.HTTPError: HTTP Error 503: Service Unavailable. First of all, I don't understand why I even get this error since the web page successfully loads. Also, another weird behavior that I have noticed is that the following code sometimes does and sometimes does not give the stated error: <pre class="prettyprint"><code>html_contents = urllib2.urlopen("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327") </code></pre> I am totally lost on how this behavior occurs. Is there any fix or work around to this? My goal is to read the html contents of the url. EDIT I don't know why stack overflow is changing my code to change the amazon link I listed above in my code to rads.stackoverflow. Anyway, ignore the rads.stackoverflow link and use my link above between the quotes.

Amazon is rejecting the default User-Agent for urllib2 . One workaround is to use the requests module <pre class="prettyprint"><code>import requests page = requests.get("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327") html_contents = page.text </code></pre> If you insist on using urllib2, this is how a header can be faked to do it: <pre class="prettyprint"><code>import urllib2 opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] response = opener.open('http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327') html_contents = response.read() </code></pre> Don't worry about stackoverflow editing the URL. They explain that they are doing this here.

Python urllib2.HTTPError: HTTP Error 503: Service Unavailable on valid website

Tags:

I have been using Amazon's Product Advertising API to generate urls that contains prices for a given book. One url that I have generated is the following:

http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327

When I click on the link or paste the link on the address bar, the web page loads fine. However, when I execute the following code I get an error:

url = "http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327" html_contents = urllib2.urlopen(url)

The error is urllib2.HTTPError: HTTP Error 503: Service Unavailable. First of all, I don't understand why I even get this error since the web page successfully loads.

Also, another weird behavior that I have noticed is that the following code sometimes does and sometimes does not give the stated error:

html_contents = urllib2.urlopen("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327")

I am totally lost on how this behavior occurs. Is there any fix or work around to this? My goal is to read the html contents of the url.

EDIT

I don't know why stack overflow is changing my code to change the amazon link I listed above in my code to rads.stackoverflow. Anyway, ignore the rads.stackoverflow link and use my link above between the quotes.

967

asked Sep 19 '14 14:09

ruthless

1 Answers

Amazon is rejecting the default User-Agent for urllib2 . One workaround is to use the requests module

import requests page = requests.get("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327") html_contents = page.text

If you insist on using urllib2, this is how a header can be faked to do it:

import urllib2 opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] response = opener.open('http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327') html_contents = response.read()

Don't worry about stackoverflow editing the URL. They explain that they are doing this here.

answered Sep 20 '22 13:09

Spade

Related questions
                            
                                GoogleApiClient onConnectionSuspended , should i call mGoogleApiClient.connect() again?
                            
                                How to find the most similar word in a list in python
                            
                                Cocoa - Where is the link between a NSCollectionView and a NSCollectionViewItem? Xcode 6 Bug?
                            
                                ggplot2: Different legend symbols for points and lines
                            
                                uses-sdk element cannot have a "tools:node" attribute
                            
                                React js: Invariant Violation: processUpdates() when rendering a table with a different number of child rows
                            
                                Reading data from a CSV file in Python
                            
                                How to convert a float string to an integer in python 3
                            
                                How to change a machine type on Google Compute Engine?
                            
                                Getting chrome performance and tracing logs
                            
                                Spring @Scheduled annotation random delay
                            
                                Safe to install Visual Studio 2015 Preview side-by-side Visual Studio 2013

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With