Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urllib2.quote does not work properly

I'm trying to get html of page which contains diacritics (í,č...). The problem is that urllib2.quote seems to not being work as I expected.

As far as I'm concerned, quote should convert url which contains diacritics to proper url.

Here is an example:

url = 'http://www.example.com/vydavatelství/'

print urllib2.quote(url)

>> http%3A//www.example.com/vydavatelstv%C3%AD/

The problem is that it changes http// string for some reason. Then the urllib2.urlopen(req) returns error:

response = urllib2.urlopen(req)
File "C:\Python27\lib\urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 437, in open response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 550, in http_response 'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 475, in error return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 409, in _call_chain result = func(*args)
File "C:\Python27\lib\urllib2.py", line 558, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request

like image 337
Milano Avatar asked Apr 12 '15 21:04

Milano


People also ask

Is urllib2 deprecated?

urllib2 is deprecated in python 3. x. use urllib instaed.

Does urllib2 work in Python 3?

NOTE: urllib2 is no longer available in Python 3 You can get more idea about urllib.

What does Urllib quote do?

The quote() function encodes space characters to %20 . If you want to encode space characters to plus sign ( + ), then you can use another function named quote_plus provided by urllib.

How do I use urllib2?

Simple urllib2 scripturlopen('http://python.org/') print "Response:", response # Get the URL. This gets the real URL. print "The URL is: ", response. geturl() # Getting the code print "This gets the code: ", response.

What is urllib2 in Python?

urllib2 is a Python module that can be used for fetching URLs. It defines functions and classes to help with URL actions (basic and digest authentication, redirections, cookies, etc) The magic starts with importing the urllib2 module.

How do I make a simple request with urllib2?

Below you can see how to make a simple request with urllib2. Begin by importing the urllib2 module. Place the response in a variable (response) The response is now a file-like object. Read the data from the response into a string (html) Do something with that string. Noteif there is a space in the URL, you will need to parse it using urlencode.

How many examples of urllib are there?

The following are 30 code examples for showing how to use urllib.quote () . These examples are extracted from open source projects. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example.

What is the use of url in a program?

A program on the Internet can work as a client (access resources) or as a server (makes services available). An URL identifies a resource on the Internet. What is Urllib2? urllib2is a Python module that can be used for fetching URLs. It defines functions and classes to help with URL actions (basic and digest


1 Answers

-- TL;DR --

Two things. First make sure you're including your shebang # -- coding: utf-8 -- at the top of your python script. This let's python know how to encode the text in your file. Second thing, you need to specify safe characters, which are not converted by the quote method. By default, only the / is specified as a safe character. This means that the : is being converted, which is breaking your URL.

url = 'http://www.example.com/vydavatelství/'
urllib2.quote(url,':/')
>>> http://www.example.com/vydavatelstv%C3%AD/

-- A little more on this --

So the first problem here is that urllib2's documentation is pretty poor. Going off the link that Kamal provided, I see no mention of the quote method in the docs. That makes trouble shooting pretty difficult.

With that said, let me explain this a little bit.

urllib2.quote seems to work the same as urllib's implementation of quote which is documented pretty well. urllib2.quote() takes four parameters

urllib.parse.quote(string, safe='/', encoding=None, errors=None)
##   string: string your trying to encode
##     safe: string contain characters to ignore. Defualt is '/'
## encoding: type of encoding url is in. Default is utf-8
##   errors: specifies how errors are handled. Default is 'strict' which throws a UnicodeEncodeError, I think.
like image 53
Austin A Avatar answered Sep 22 '22 17:09

Austin A