urllib2.quote does not work properly

Tags:

I'm trying to get html of page which contains diacritics (í,č...). The problem is that urllib2.quote seems to not being work as I expected.

As far as I'm concerned, quote should convert url which contains diacritics to proper url.

Here is an example:

url = 'http://www.example.com/vydavatelství/'

print urllib2.quote(url)

>> http%3A//www.example.com/vydavatelstv%C3%AD/

The problem is that it changes http// string for some reason. Then the urllib2.urlopen(req) returns error:

response = urllib2.urlopen(req)
File "C:\Python27\lib\urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 437, in open response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 550, in http_response 'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 475, in error return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 409, in _call_chain result = func(*args)
File "C:\Python27\lib\urllib2.py", line 558, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request

337

asked Apr 12 '15 21:04

Milano

1 Answers

-- TL;DR --

Two things. First make sure you're including your shebang # -- coding: utf-8 -- at the top of your python script. This let's python know how to encode the text in your file. Second thing, you need to specify safe characters, which are not converted by the quote method. By default, only the / is specified as a safe character. This means that the : is being converted, which is breaking your URL.

url = 'http://www.example.com/vydavatelství/'
urllib2.quote(url,':/')
>>> http://www.example.com/vydavatelstv%C3%AD/

-- A little more on this --

So the first problem here is that urllib2's documentation is pretty poor. Going off the link that Kamal provided, I see no mention of the quote method in the docs. That makes trouble shooting pretty difficult.

With that said, let me explain this a little bit.

urllib2.quote seems to work the same as urllib's implementation of quote which is documented pretty well. urllib2.quote() takes four parameters

urllib.parse.quote(string, safe='/', encoding=None, errors=None)
##   string: string your trying to encode
##     safe: string contain characters to ignore. Defualt is '/'
## encoding: type of encoding url is in. Default is utf-8
##   errors: specifies how errors are handled. Default is 'strict' which throws a UnicodeEncodeError, I think.

answered Sep 22 '22 17:09

Austin A

Related questions
                            
                                Custom Deployment to Azure Websites
                            
                                Proper overloading of json encoding and decoding with Flask
                            
                                Equivalent for ? in Java for Python? [duplicate]
                            
                                Python/Pip C package PyProj fails to compile with GCC
                            
                                unable to add spark to PYTHONPATH
                            
                                How to create a user 'programmatically' with Flask-user extension?
                            
                                check Python requests with charles proxy for HTTPS
                            
                                How to quickly determine if a matrix is a permutation matrix
                            
                                How to use SQLAlchemy contextmanager and still get row ID?
                            
                                Scrapy Spider: Restart spider when finishes
                            
                                splitting data into test and train, making a logistic regression model in pandas
                            
                                Proper way to convert bytea from Postgres back to a string in python
                            
                                Payment method token is invalid in Braintree
                            
                                Python Pandas 'apply' returns series; can't convert to dataframe
                            
                                align three time series in python
                            
                                Issues with username field in Python-social-auth
                            
                                why do you need "if instance is None" in __get__ of a descriptor class?
                            
                                mocking a function within a class method
                            
                                cosine similarity between two words in a list
                            
                                How to remove key from request QueryDict in Django?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

urllib2.quote does not work properly

Tags:

python

html

url

urllib2

Milano

People also ask

1 Answers

Austin A

Recent Activity

Donate For Us