from the documents, the urllib.unquote_plus should replce plus signs by spaces. but when I tried the below code in IDLE for python 2.7, it did not.
>>s = 'http://stackoverflow.com/questions/?q1=xx%2Bxx%2Bxx'
>>urllib.unquote_plus(s)
>>'http://stackoverflow.com/questions/?q1=xx+xx+xx'
I also tried doing something like urllib.unquote_plus(s).decode('utf-8').
is there a proper to decode the url component?
%2B
is the escape code for a literal +
; it is being unescaped entirely correctly.
Don't confuse this with the URL escaped +
, which is the escape character for spaces:
>>> s = 'http://stackoverflow.com/questions/?q1=xx+xx+xx'
>>> urllib.unquote_plus(s)
'http://stackoverflow.com/questions/?q1=xx xx xx'
unquote_plus()
only decodes encoded spaces to literal spaces ('+'
-> ' '
), not encoded +
symbols ('%2B'
-> '+'
).
If you have input to decode that uses %2B
instead of +
where you expected spaces, then those input values were perhaps doubly quoted, you'd need to unquote them twice. You'd see %
escapes encoded too:
>>> urllib.quote_plus('Hello world!')
'Hello+world%21'
>>> urllib.quote_plus(urllib.quote_plus('Hello world!'))
'Hello%2Bworld%2521'
where %25
is the quoted %
character.
Those aren't spaces, those are actual pluses. A space is %20, which in that part of the URL is indeed equivalent to +, but %2B means a literal plus.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With