How can I decode percent-encoded characters to ordinary unicode characters?
Lech_Kaczy%C5%84ski
Lech_Kaczyński
I tried urllib.unquote(text)
but then got Lech_Kaczy\xc5\x84ski
.
I also tried the following, but it doesn't change the result:
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
In Python 3+, You can URL encode any string using the quote () function provided by urllib.parse package. The quote () function by default uses UTF-8 encoding scheme. Note that, the quote () function considers / character safe by default. That means, It doesn’t encode / character -
URL Decoding query strings or form parameters in Python. URL decoding, as the name suggests, is the inverse operation of URL encoding. It is often needed when you’re reading query strings or form parameters received from a client. HTML forms by default use application/x-www-form-urlencoded content type for sending parameters.
The term URL encoding is a bit inexact because the encoding procedure is not limited to URLs ( Uniform Resource Locators ), but can also be applied to any other URIs ( Uniform Resource Identifiers) such as URNs ( Uniform Resource Names ). Therefore, the term percent-encoding should be preferred.
Use the online tool from above to either encode or decode a string of text. For worldwide interoperability, URIs have to be encoded uniformly. To map the wide range of characters used worldwide into the 60 or so allowed characters in a URI, a two-step process is used: Convert the character string into a sequence of bytes using the UTF-8 encoding
For Python 3, using urllib.parse.unquote
:
from urllib.parse import unquote
print(unquote("Lech_Kaczy%C5%84ski"))
Output:
Lech_Kaczyński
For Python 2, using urllib.unquote
:
import urllib
urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8')
This will return a unicode string:
u'Lech_Kaczy\u0144ski'
which you can then print and process as usual. For example:
print(urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8'))
will result in
Lech_Kaczyński
This worked for me:
import urllib
print urllib.unquote('Lech_Kaczy%C5%84ski')
Prints out
Lech_Kaczyński
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With