Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I encode and decode percent-encoded (URL encoded) strings in Python?

How can I decode percent-encoded characters to ordinary unicode characters?

  • Input string: Lech_Kaczy%C5%84ski
  • Desired output: Lech_Kaczyński

I tried urllib.unquote(text) but then got Lech_Kaczy\xc5\x84ski.

I also tried the following, but it doesn't change the result:

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
like image 670
yak Avatar asked Oct 15 '15 08:10

yak


People also ask

How to URL encode any string in Python 3+?

In Python 3+, You can URL encode any string using the quote () function provided by urllib.parse package. The quote () function by default uses UTF-8 encoding scheme. Note that, the quote () function considers / character safe by default. That means, It doesn’t encode / character -

What is URL decoding in Python?

URL Decoding query strings or form parameters in Python. URL decoding, as the name suggests, is the inverse operation of URL encoding. It is often needed when you’re reading query strings or form parameters received from a client. HTML forms by default use application/x-www-form-urlencoded content type for sending parameters.

What is URL encoding and percent encoding?

The term URL encoding is a bit inexact because the encoding procedure is not limited to URLs ( Uniform Resource Locators ), but can also be applied to any other URIs ( Uniform Resource Identifiers) such as URNs ( Uniform Resource Names ). Therefore, the term percent-encoding should be preferred.

How do I encode or decode a string of text?

Use the online tool from above to either encode or decode a string of text. For worldwide interoperability, URIs have to be encoded uniformly. To map the wide range of characters used worldwide into the 60 or so allowed characters in a URI, a two-step process is used: Convert the character string into a sequence of bytes using the UTF-8 encoding


3 Answers

For Python 3, using urllib.parse.unquote:

from urllib.parse import unquote

print(unquote("Lech_Kaczy%C5%84ski"))

Output:

Lech_Kaczyński
like image 183
Mateen Ulhaq Avatar answered Sep 30 '22 04:09

Mateen Ulhaq


For Python 2, using urllib.unquote:

import urllib
urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8')

This will return a unicode string:

u'Lech_Kaczy\u0144ski'

which you can then print and process as usual. For example:

print(urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8'))

will result in

Lech_Kaczyński
like image 32
Matthias C. M. Troffaes Avatar answered Sep 30 '22 04:09

Matthias C. M. Troffaes


This worked for me:

import urllib

print urllib.unquote('Lech_Kaczy%C5%84ski')

Prints out

Lech_Kaczyński
like image 32
answerzilla Avatar answered Sep 30 '22 03:09

answerzilla