I have a unicode string in python code:
name = u'Mayte_Martín'
I would like to use it with a SPARQL query, which meant that I should encode the string using 'utf-8' and use urllib.quote_plus or requests.quote on it. However, both these quote functions behave strangely as can be seen when used with and without the 'safe' arguments.
from urllib import quote_plus
Without 'safe' argument:
quote_plus(name.encode('utf-8'))
Output: 'Mayte_Mart%C3%ADn'
With 'safe' argument:
quote_plus(name.encode('utf-8'), safe=':/')
Output:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-164-556248391ee1> in <module>()
----> 1 quote_plus(v, safe=':/')
/usr/lib/python2.7/urllib.pyc in quote_plus(s, safe)
1273 s = quote(s, safe + ' ')
1274 return s.replace(' ', '+')
-> 1275 return quote(s, safe)
1276
1277 def urlencode(query, doseq=0):
/usr/lib/python2.7/urllib.pyc in quote(s, safe)
1264 safe = always_safe + safe
1265 _safe_quoters[cachekey] = (quoter, safe)
-> 1266 if not s.rstrip(safe):
1267 return s
1268 return ''.join(map(quoter, s))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)
The problem seems to be with rstrip function. I tried to make some changes and call as...
quote_plus(name.encode('utf-8'), safe=u':/'.encode('utf-8'))
But that did not solve the issue. What could be the issue here?
I'm answering my own question, so that it may help others who face the same issue.
This particular issue arises when you make the following import in the current workspace before executing anything else.
from __future__ import unicode_literals
This has somehow turned out to be incompatible with the following sequence of code.
from urllib import quote_plus
name = u'Mayte_Martín'
quote_plus(name.encode('utf-8'), safe=':/')
The same code without importing unicode_literals works fine.
According to this bug, here is the workaround:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from urllib import quote_plus
name = u'Mayte_Martín'
quote_plus(name.encode('utf-8'), safe=':/'.encode('utf-8'))
You must encode
both argument in quote
or quote_plus
method to utf-8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With