I'm trying to get html of page which contains diacritics (í,č...). The problem is that urllib2.quote
seems to not being work as I expected.
As far as I'm concerned, quote should convert url which contains diacritics to proper url.
Here is an example:
url = 'http://www.example.com/vydavatelství/'
print urllib2.quote(url)
>> http%3A//www.example.com/vydavatelstv%C3%AD/
The problem is that it changes http//
string for some reason. Then the urllib2.urlopen(req)
returns error:
response = urllib2.urlopen(req)
File "C:\Python27\lib\urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 437, in open response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 550, in http_response 'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 475, in error return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 409, in _call_chain result = func(*args)
File "C:\Python27\lib\urllib2.py", line 558, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request
urllib2 is deprecated in python 3. x. use urllib instaed.
NOTE: urllib2 is no longer available in Python 3 You can get more idea about urllib.
The quote() function encodes space characters to %20 . If you want to encode space characters to plus sign ( + ), then you can use another function named quote_plus provided by urllib.
Simple urllib2 scripturlopen('http://python.org/') print "Response:", response # Get the URL. This gets the real URL. print "The URL is: ", response. geturl() # Getting the code print "This gets the code: ", response.
urllib2 is a Python module that can be used for fetching URLs. It defines functions and classes to help with URL actions (basic and digest authentication, redirections, cookies, etc) The magic starts with importing the urllib2 module.
Below you can see how to make a simple request with urllib2. Begin by importing the urllib2 module. Place the response in a variable (response) The response is now a file-like object. Read the data from the response into a string (html) Do something with that string. Noteif there is a space in the URL, you will need to parse it using urlencode.
The following are 30 code examples for showing how to use urllib.quote () . These examples are extracted from open source projects. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example.
A program on the Internet can work as a client (access resources) or as a server (makes services available). An URL identifies a resource on the Internet. What is Urllib2? urllib2is a Python module that can be used for fetching URLs. It defines functions and classes to help with URL actions (basic and digest
-- TL;DR --
Two things. First make sure you're including your shebang # -- coding: utf-8 --
at the top of your python script. This let's python know how to encode the text in your file. Second thing, you need to specify safe characters, which are not converted by the quote method. By default, only the /
is specified as a safe character. This means that the :
is being converted, which is breaking your URL.
url = 'http://www.example.com/vydavatelství/'
urllib2.quote(url,':/')
>>> http://www.example.com/vydavatelstv%C3%AD/
-- A little more on this --
So the first problem here is that urllib2's documentation is pretty poor. Going off the link that Kamal provided, I see no mention of the quote
method in the docs. That makes trouble shooting pretty difficult.
With that said, let me explain this a little bit.
urllib2.quote
seems to work the same as urllib's implementation of quote which is documented pretty well. urllib2.quote() takes four parameters
urllib.parse.quote(string, safe='/', encoding=None, errors=None)
## string: string your trying to encode
## safe: string contain characters to ignore. Defualt is '/'
## encoding: type of encoding url is in. Default is utf-8
## errors: specifies how errors are handled. Default is 'strict' which throws a UnicodeEncodeError, I think.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With