These functions seem to do the same thing.
irb> CGI.unescape "Sloths%3A+Society+and+Habitat"
=> "Sloths: Society and Habitat"
irb> URI.decode_www_form_component "Sloths%3A+Society+and+Habitat"
=> "Sloths: Society and Habitat"
What's the difference?
These methods are very similar. They both accept a string and an encoding and return a string in the specified encoding with the %
escapes decoded. But there are differences:
URI.decode_www_form_component
raises an ArgumentError
if the string contains invalid escape sequences.
URI.decode_www_form_component('%xz')
# ArgumentError: invalid %-encoding (%xz)
CGI.unescape
simply ignores them.
CGI.unescape('%xz')
# "%xz"
CGI.unescape
ignores your specified encoding if the result is invalid
p CGI.unescape("\u263a", 'ASCII')
# "☺"
URI.decode_www_form_component
doesn't care
p URI.decode_www_form_component("\u263a", 'ASCII')
# "\xE2\x98\xBA"
Lastly (and I hesitate to even mention this), URI.decode_www_form_component
is slightly faster because it uses a precomputed Hash to decode all 485 valid escape codes (it's case-sensitive), whereas CGI.unescape
actually interprets the hex code and repacks it as a character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With