<code>urlparse.parse_qs</code> is usefull for parsing url parameters, and it works fine with simple ASCII url, represented by <code>str</code>. So i can parse a query and then construct the same path using <code>urllib.urlencode</code> from parsed data: <pre class="prettyprint"><code>>>> import urlparse >>> import urllib >>> >>> path = '/?key=value' #path is str >>> query = urlparse.urlparse(path).query >>> query 'key=value' >>> query_dict = urlparse.parse_qs(query) >>> query_dict {'key': ['value']} >>> '/?' + urllib.urlencode(query_dict, doseq=True) '/?key=value' # <-- path is the same here </code></pre> It also works fine, when url contains percent encoded non-ASCII param: <pre class="prettyprint"><code>>>> value = urllib.quote(u'значение'.encode('utf8')) >>> value '%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5' >>> path = '/?key=%s' % value >>> path '/?key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5' >>> query = urlparse.urlparse(path).query >>> query 'key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5' >>> query_dict = urlparse.parse_qs(query) >>> query_dict {'key': ['\xd0\xb7\xd0\xbd\xd0\xb0\xd1\x87\xd0\xb5\xd0\xbd\xd0\xb8\xd0\xb5']} >>> '/?' + urllib.urlencode(query_dict, doseq=True) '/?key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5' # <-- path is the same here </code></pre> But, when using django, i get the url using <code>request.get_full_path()</code>, and it returns path as <code>unicode</code> string: <pre class="prettyprint"><code>>>> path = request.get_full_path() >>> path u'/?key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5' # path is unicode </code></pre> Look what will happen now: <pre class="prettyprint"><code>>>> query = urlparse.urlparse(path).query >>> query u'key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5' >>> query_dict = urlparse.parse_qs(query) >>> query_dict {u'key': [u'\xd0\xb7\xd0\xbd\xd0\xb0\xd1\x87\xd0\xb5\xd0\xbd\xd0\xb8\xd0\xb5']} >>> </code></pre> <code>query_dict</code> contains unicode string, that contains bytes! Not unicode points! And of course i've got a UnicodeEncodeError, when trying to urlencode that string: <pre class="prettyprint"><code>>>> urllib.urlencode(query_dict, doseq=True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python27\Lib\urllib.py", line 1337, in urlencode l.append(k + '=' + quote_plus(str(elt))) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-15: ordinal not in range(128) </code></pre> Currently i have a solution: <pre class="prettyprint"><code># just convert path, returned by request.get_full_path(), to `str` explicitly: path = str(request.get_full_path()) </code></pre> So the questions are: <ul> <li>why parse_qs return so strange string (unicode, that contains bytes)?</li> <li>is it safe to convert url to str?</li> </ul>

Encode back to bytes before passing it to <code>.parse_qs()</code>, using ASCII: <pre class="prettyprint"><code>query_dict = urlparse.parse_qs(query.encode('ASCII')) </code></pre> This does the same thing as <code>str()</code> but with an explicit encoding. Yes, this is safe, the URL encoding uses ASCII codepoints only. <code>parse_qs</code> was handed a Unicode value, so it returned you a unicode value too; it is not it's job to decode bytes.

Python urlparse.parse_qs unicode url

Tags:

python

urlencode

django

urlparse

urlparse.parse_qs is usefull for parsing url parameters, and it works fine with simple ASCII url, represented by str. So i can parse a query and then construct the same path using urllib.urlencode from parsed data:

>>> import urlparse
>>> import urllib
>>>
>>> path = '/?key=value' #path is str
>>> query = urlparse.urlparse(path).query
>>> query
'key=value'
>>> query_dict = urlparse.parse_qs(query)
>>> query_dict
{'key': ['value']}
>>> '/?' + urllib.urlencode(query_dict, doseq=True)
'/?key=value' # <-- path is the same here

It also works fine, when url contains percent encoded non-ASCII param:

>>> value = urllib.quote(u'значение'.encode('utf8'))
>>> value
'%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5'
>>> path = '/?key=%s' % value
>>> path
'/?key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5'
>>> query = urlparse.urlparse(path).query
>>> query
'key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5'
>>> query_dict = urlparse.parse_qs(query)
>>> query_dict
{'key': ['\xd0\xb7\xd0\xbd\xd0\xb0\xd1\x87\xd0\xb5\xd0\xbd\xd0\xb8\xd0\xb5']}

>>> '/?' + urllib.urlencode(query_dict, doseq=True)
'/?key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5'  # <-- path is the same here

But, when using django, i get the url using request.get_full_path(), and it returns path as unicode string:

>>> path = request.get_full_path()
>>> path
u'/?key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5' # path is unicode

Look what will happen now:

>>> query = urlparse.urlparse(path).query
>>> query
u'key=%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D0%B5'
>>> query_dict = urlparse.parse_qs(query)
>>> query_dict
{u'key': [u'\xd0\xb7\xd0\xbd\xd0\xb0\xd1\x87\xd0\xb5\xd0\xbd\xd0\xb8\xd0\xb5']}
>>>

query_dict contains unicode string, that contains bytes! Not unicode points! And of course i've got a UnicodeEncodeError, when trying to urlencode that string:

>>> urllib.urlencode(query_dict, doseq=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\Lib\urllib.py", line 1337, in urlencode
    l.append(k + '=' + quote_plus(str(elt)))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-15: ordinal not in range(128)

Currently i have a solution:

# just convert path, returned by request.get_full_path(), to `str` explicitly:
path = str(request.get_full_path())

So the questions are:

why parse_qs return so strange string (unicode, that contains bytes)?
is it safe to convert url to str?

305

asked May 17 '13 17:05

stalk

1 Answers

Encode back to bytes before passing it to .parse_qs(), using ASCII:

query_dict = urlparse.parse_qs(query.encode('ASCII'))

This does the same thing as str() but with an explicit encoding. Yes, this is safe, the URL encoding uses ASCII codepoints only.

parse_qs was handed a Unicode value, so it returned you a unicode value too; it is not it's job to decode bytes.

answered Oct 12 '22 04:10

Martijn Pieters

Related questions
                            
                                How to stop SIGINT being passed to subprocess in python?
                            
                                Check Pending AJAX requests or HTTP GET/POST request
                            
                                how do i set a timeout value for python's mechanize?
                            
                                How to send a package to PyPi?
                            
                                How to reverse a color map image to scalar values?
                            
                                Platform independent tool to copy text to clipboard
                            
                                Emacs for Python programming: module/class outline/browser
                            
                                Save image created via PIL to django model
                            
                                Can I efficiently swap two class instances by swapping __dict__?
                            
                                python's glob only returning the first result
                            
                                SciPy curve_fit runtime error, stopping iteration
                            
                                Solving Systems of Equations with SymPy
                            
                                String immutability in CPython violated
                            
                                bottle framework with multiple files
                            
                                How to force errorbars to render last with Matplotlib
                            
                                Python’s `str.format()`, fill characters, and ANSI colors
                            
                                Using PostGIS on Python 3
                            
                                How to retrieve session data with Flask?
                            
                                multiprocessing GUI schemas to combat the "Not Responding" blocking
                            
                                Matplotlib animate fill_between shape

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With