I have a unicode string that was encoded on the client side using JS encodeURIComponent.
If I use the following in Python locally, I get the expected result:
>>> urllib.unquote("Foo%E2%84%A2%20Bar").decode("utf-8")
>>> u'Foo\u2122 Bar'
But when I run this in Google App Engine, I get:
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 703, in __call__
handler.post(*groups)
File "/base/data/home/apps/s~kaon-log/2.357769827131038147/main.py", line 143, in post
path_uni = urllib.unquote(h.path).decode('utf-8')
File "/base/python_runtime/python_dist/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-5: ordinal not in range(128)
I'm still using Python 2.5, in case that makes a difference. What am I missing?
decode() is a method specified in Strings in Python 2. This method is used to convert from one encoding scheme, in which argument string is encoded to the desired encoding scheme. This works opposite to the encode. It accepts the encoding of the encoding string to decode it and returns the original string.
The encodeURIComponent() function encodes a URI by replacing each instance of certain characters by one, two, three, or four escape sequences representing the UTF-8 encoding of the character (will only be four escape sequences for characters composed of two "surrogate" characters).
The decodeURIComponent() function decodes a Uniform Resource Identifier (URI) component previously created by encodeURIComponent or by a similar routine.
unquote() This function replaces %xx escapes by their single-character equivalent.
My guess is that h.path
is a unicode object. Then urllib.unquote
would return a unicode object. When decode
is called on a unicode object at first it is converted to str
using default encoding (which is ascii) and here you get the 'ascii' codec can't encode
exception.
Here is a proof:
>>> urllib.unquote(u"Foo%E2%84%A2%20Bar").decode("utf-8")
...
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-5: ordinal not in range(128)
This should work:
urllib.unquote(h.path.encode('utf-8')).decode("utf-8")
There is a stackoverflow thread which explains why unicode doesn't work with urllib.unquote
: How to unquote a urlencoded unicode string in python?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With