I am trying to encode and store, and decode arguments in Python and getting lost somewhere along the way. Here are my steps:
1) I use google toolkit's gtm_stringByEscapingForURLArgument
to convert an NSString properly for passing into HTTP arguments.
2) On my server (python), I store these string arguments as something like u'1234567890-/:;()$&@".,?!\'[]{}#%^*+=_\\|~<>\u20ac\xa3\xa5\u2022.,?!\''
(note that these are the standard keys on an iphone keypad in the "123" view and the "#+=" view, the \u
and \x
chars in there being some monetary prefixes like pound, yen, etc)
3) I call urllib.quote(myString,'')
on that stored value, presumably to %-escape them for transport to the client so the client can unpercent escape them.
The result is that I am getting an exception when I try to log the result of % escaping. Is there some crucial step I am overlooking that needs to be applied to the stored value with the \u and \x format in order to properly convert it for sending over http?
Update: The suggestion marked as the answer below worked for me. I am providing some updates to address the comments below to be complete, though.
The exception I received cited an issue with \u20ac
. I don't know if it was a problem with that specifically, rather than the fact that it was the first unicode character in the string.
That \u20ac
char is the unicode for the 'euro' symbol. I basically found I'd have issues with it unless I used the urllib2 quote
method.
In Python 3+, You can URL encode any string using the quote() function provided by urllib. parse package. The quote() function by default uses UTF-8 encoding scheme.
replace('%20+', '') will replace '%20+' with empty string.
url encoding a "raw" unicode doesn't really make sense. What you need to do is .encode("utf8")
first so you have a known byte encoding and then .quote()
that.
The output isn't very pretty but it should be a correct uri encoding.
>>> s = u'1234567890-/:;()$&@".,?!\'[]{}#%^*+=_\|~<>\u20ac\xa3\xa5\u2022.,?!\'' >>> urllib2.quote(s.encode("utf8")) '1234567890-/%3A%3B%28%29%24%26%40%22.%2C%3F%21%27%5B%5D%7B%7D%23%25%5E%2A%2B%3D_%5C%7C%7E%3C%3E%E2%82%AC%C2%A3%C2%A5%E2%80%A2.%2C%3F%21%27'
Remember that you will need to both unquote()
and decode()
this to print it out properly if you're debugging or whatever.
>>> print urllib2.unquote(urllib2.quote(s.encode("utf8"))) 1234567890-/:;()$&@".,?!'[]{}#%^*+=_\|~<>€£¥•.,?!' >>> # oops, nasty  means we've got a utf8 byte stream being treated as an ascii stream >>> print urllib2.unquote(urllib2.quote(s.encode("utf8"))).decode("utf8") 1234567890-/:;()$&@".,?!'[]{}#%^*+=_\|~<>€£¥•.,?!'
This is, in fact, what the django functions mentioned in another answer do.
The functions django.utils.http.urlquote() and django.utils.http.urlquote_plus() are versions of Python’s standard urllib.quote() and urllib.quote_plus() that work with non-ASCII characters. (The data is converted to UTF-8 prior to encoding.)
Be careful if you are applying any further quotes or encodings not to mangle things.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With