Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urllib.unquote_plus(s) does not convert plus symbol to space

Tags:

python

from the documents, the urllib.unquote_plus should replce plus signs by spaces. but when I tried the below code in IDLE for python 2.7, it did not.

>>s = 'http://stackoverflow.com/questions/?q1=xx%2Bxx%2Bxx'
>>urllib.unquote_plus(s)
>>'http://stackoverflow.com/questions/?q1=xx+xx+xx'

I also tried doing something like urllib.unquote_plus(s).decode('utf-8'). is there a proper to decode the url component?

like image 805
jjennifer Avatar asked Sep 06 '13 16:09

jjennifer


2 Answers

%2B is the escape code for a literal +; it is being unescaped entirely correctly.

Don't confuse this with the URL escaped +, which is the escape character for spaces:

>>> s = 'http://stackoverflow.com/questions/?q1=xx+xx+xx'
>>> urllib.unquote_plus(s)
'http://stackoverflow.com/questions/?q1=xx xx xx'

unquote_plus() only decodes encoded spaces to literal spaces ('+' -> ' '), not encoded + symbols ('%2B' -> '+').

If you have input to decode that uses %2B instead of + where you expected spaces, then those input values were perhaps doubly quoted, you'd need to unquote them twice. You'd see % escapes encoded too:

>>> urllib.quote_plus('Hello world!')
'Hello+world%21'
>>> urllib.quote_plus(urllib.quote_plus('Hello world!'))
'Hello%2Bworld%2521'

where %25 is the quoted % character.

like image 85
Martijn Pieters Avatar answered Sep 25 '22 07:09

Martijn Pieters


Those aren't spaces, those are actual pluses. A space is %20, which in that part of the URL is indeed equivalent to +, but %2B means a literal plus.

like image 33
gcbirzan Avatar answered Sep 25 '22 07:09

gcbirzan