Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTTP Error 406: Not Acceptable Python urllib2

I get the following error with the code below.

HTTP Error 406: Not Acceptable Python urllib2

This is my first step before I use beautifulsoup to parse the page.

import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
url = "http://www.choicemoney.us/retail.php"
response = opener.open(url)

All help greatly appreciated.

like image 296
cflanagan17 Avatar asked Jan 16 '16 22:01

cflanagan17


People also ask

What is urllib2 in Python?

urllib2 is a Python module that can be used for fetching URLs. It defines functions and classes to help with URL actions (basic and digest. authentication, redirections, cookies, etc) The magic starts with importing the urllib2 module.


2 Answers

The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request. [RFC2616]

Based on the code and what the RFC describes I assume that you need to set both the key and the value of the User-Agent header correctly.

These are correct examples:

  • Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11

  • Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36

  • Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A

Just replace the following.

opener.addheaders = [('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A')]
like image 184
ipinak Avatar answered Sep 30 '22 15:09

ipinak


I believe @ipinak's answer is correct.

urllib2 actually provides a default User-Agent that works here, so if you delete opener.addheaders = [('User-agent', 'Mozilla/5.0')] the response should have status code 200.

I recommend the popular requests library for such jobs as its API is much easier to use.

url = "http://www.choicemoney.us/retail.php"
resp = requests.get(url)
print resp.status_code # 200
print resp.content # can be used in your beautifulsoup.
like image 21
ohw Avatar answered Sep 30 '22 16:09

ohw