Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urllib2 HTTP Error 400: Bad Request

Tags:

I have a piece of code like this

host = 'http://www.bing.com/search?q=%s&go=&qs=n&sk=&sc=8-13&first=%s' % (query, page) req = urllib2.Request(host) req.add_header('User-Agent', User_Agent) response = urllib2.urlopen(req) 

and when I input a query greater than one word like "the dog" i get the following error.

response = urllib2.urlopen(req) File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 400, in open response = meth(req, response) File "/usr/lib/python2.7/urllib2.py", line 513, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python2.7/urllib2.py", line 438, in error return self._call_chain(*args) File "/usr/lib/python2.7/urllib2.py", line 372, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 521, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 400: Bad Request 

Can anyone point out what im doing wrong? Thanks in advance.

like image 345
PyFan Avatar asked Jan 12 '12 18:01

PyFan


People also ask

What does 400 Bad request mean?

The HyperText Transfer Protocol (HTTP) 400 Bad Request response status code indicates that the server cannot or will not process the request due to something that is perceived to be a client error (for example, malformed request syntax, invalid request message framing, or deceptive request routing).

How do I import urllib2 into Python 3?

Simple urllib2 scriptimport urllib2 response = urllib2. urlopen('http://python.org/') print "Response:", response # Get the URL. This gets the real URL. print "The URL is: ", response.


2 Answers

The reason that "the dog" returns a 400 Error is because you aren't escaping the string for a URL.

If you do this:

import urllib, urllib2  quoted_query = urllib.quote(query) host = 'http://www.bing.com/search?q=%s&go=&qs=n&sk=&sc=8-13&first=%s' % (quoted_query, page) req = urllib2.Request(host) req.add_header('User-Agent', User_Agent) response = urllib2.urlopen(req) 

It will work.

However I highly suggest you use requests instead of using urllib/urllib2/httplib. It's much much easier and it'll handle all of this for you.

This is the same code with python requests:

import requests  results = requests.get("http://www.bing.com/search",                params={'q': query, 'first': page},                headers={'User-Agent': user_agent}) 
like image 101
ravenac95 Avatar answered Oct 03 '22 09:10

ravenac95


You need to use urllib.quote() on your 'query' variable:

query = urllib.quote(query) host = 'http://www.bing.com/search?q=%s&go=&qs=n&sk=&sc=8-13&first=%s' % (query, page) 

This does the necessary URL escaping to convert the space in big dog to big%20dog.

like image 36
Zach Young Avatar answered Oct 03 '22 08:10

Zach Young