I get the following error with the code below.
HTTP Error 406: Not Acceptable Python urllib2
This is my first step before I use beautifulsoup to parse the page.
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
url = "http://www.choicemoney.us/retail.php"
response = opener.open(url)
All help greatly appreciated.
urllib2 is a Python module that can be used for fetching URLs. It defines functions and classes to help with URL actions (basic and digest. authentication, redirections, cookies, etc) The magic starts with importing the urllib2 module.
The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request. [RFC2616]
Based on the code and what the RFC describes I assume that you need to set both the key and the value of the User-Agent
header correctly.
These are correct examples:
Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A
Just replace the following.
opener.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A')]
I believe @ipinak's answer is correct.
urllib2
actually provides a default User-Agent that works here, so if you delete opener.addheaders = [('User-agent', 'Mozilla/5.0')]
the response should have status code 200.
I recommend the popular requests library for such jobs as its API is much easier to use.
url = "http://www.choicemoney.us/retail.php"
resp = requests.get(url)
print resp.status_code # 200
print resp.content # can be used in your beautifulsoup.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With