Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

URL encoding/decoding with Python

I am trying to encode and store, and decode arguments in Python and getting lost somewhere along the way. Here are my steps:

1) I use google toolkit's gtm_stringByEscapingForURLArgument to convert an NSString properly for passing into HTTP arguments.

2) On my server (python), I store these string arguments as something like u'1234567890-/:;()$&@".,?!\'[]{}#%^*+=_\\|~<>\u20ac\xa3\xa5\u2022.,?!\'' (note that these are the standard keys on an iphone keypad in the "123" view and the "#+=" view, the \u and \x chars in there being some monetary prefixes like pound, yen, etc)

3) I call urllib.quote(myString,'') on that stored value, presumably to %-escape them for transport to the client so the client can unpercent escape them.

The result is that I am getting an exception when I try to log the result of % escaping. Is there some crucial step I am overlooking that needs to be applied to the stored value with the \u and \x format in order to properly convert it for sending over http?

Update: The suggestion marked as the answer below worked for me. I am providing some updates to address the comments below to be complete, though.

The exception I received cited an issue with \u20ac. I don't know if it was a problem with that specifically, rather than the fact that it was the first unicode character in the string.

That \u20ac char is the unicode for the 'euro' symbol. I basically found I'd have issues with it unless I used the urllib2 quote method.

like image 635
Joey Avatar asked Aug 25 '10 05:08

Joey


People also ask

How do you encode a URL in Python?

In Python 3+, You can URL encode any string using the quote() function provided by urllib. parse package. The quote() function by default uses UTF-8 encoding scheme.

How do I remove 20 from a URL in Python?

replace('%20+', '') will replace '%20+' with empty string.


1 Answers

url encoding a "raw" unicode doesn't really make sense. What you need to do is .encode("utf8") first so you have a known byte encoding and then .quote() that.

The output isn't very pretty but it should be a correct uri encoding.

>>> s = u'1234567890-/:;()$&@".,?!\'[]{}#%^*+=_\|~<>\u20ac\xa3\xa5\u2022.,?!\'' >>> urllib2.quote(s.encode("utf8")) '1234567890-/%3A%3B%28%29%24%26%40%22.%2C%3F%21%27%5B%5D%7B%7D%23%25%5E%2A%2B%3D_%5C%7C%7E%3C%3E%E2%82%AC%C2%A3%C2%A5%E2%80%A2.%2C%3F%21%27' 

Remember that you will need to both unquote() and decode() this to print it out properly if you're debugging or whatever.

>>> print urllib2.unquote(urllib2.quote(s.encode("utf8"))) 1234567890-/:;()$&@".,?!'[]{}#%^*+=_\|~<>€£¥•.,?!' >>> # oops, nasty  means we've got a utf8 byte stream being treated as an ascii stream >>> print urllib2.unquote(urllib2.quote(s.encode("utf8"))).decode("utf8") 1234567890-/:;()$&@".,?!'[]{}#%^*+=_\|~<>€£¥•.,?!' 

This is, in fact, what the django functions mentioned in another answer do.

The functions django.utils.http.urlquote() and django.utils.http.urlquote_plus() are versions of Python’s standard urllib.quote() and urllib.quote_plus() that work with non-ASCII characters. (The data is converted to UTF-8 prior to encoding.)

Be careful if you are applying any further quotes or encodings not to mangle things.

like image 103
pycruft Avatar answered Sep 23 '22 23:09

pycruft