Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

percent encoding URL with python

When I enter a URL into maps.google.com such as https://dl.dropbox.com/u/94943007/file.kml , it will encode this URL into:

https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml

I am wondering what is this encoding called and is there a way to encode a URL like this using python?

I tried this:

The process is called URL encoding:

>>> urllib.quote('https://dl.dropbox.com/u/94943007/file.kml', '')
'https%3A%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml'

but did not get the expected results:

'https%3A//dl.dropbox.com/u/94943007/file.kml'

what i need is this:

https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml

how do i encode this URL properly?

the documentation here:

https://developers.google.com/maps/documentation/webservices/

states:

All characters to be URL-encoded are encoded using a '%' character and a two-character hex value corresponding to their UTF-8 character. For example, 上海+中國 in UTF-8 would be URL-encoded as %E4%B8%8A%E6%B5%B7%2B%E4%B8%AD%E5%9C%8B. The string ? and the Mysterians would be URL-encoded as %3F+and+the+Mysterians.

like image 559
Alex Gordon Avatar asked Aug 24 '12 18:08

Alex Gordon


1 Answers

Use

urllib.quote_plus(url, safe=':')

Since you don't want the colon encoded you need to specify that when calling urllib.quote():

>>> expected = 'https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml'
>>> url = 'https://dl.dropbox.com/u/94943007/file.kml'
>>> urllib.quote(url, safe=':') == expected
True

urllib.quote() takes a keyword argument safe that defaults to / and indicates which characters are considered safe and therefore don't need to be encoded. In your first example you used '' which resulted in the slashes being encoded. The unexpected output you pasted below where the slashes weren't encoded probably was from a previous attempt where you didn't set the keyword argument safe at all.

Overriding the default of '/' and instead excluding the colon with ':' is what finally yields the desired result.

Edit: Additionally, the API calls for spaces to be encoded as plus signs. Therefore urllib.quote_plus() should be used (whose keyword argument safe doesn't default to '/').

like image 114
Lukas Graf Avatar answered Oct 16 '22 09:10

Lukas Graf