Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between CGI.unescape and URI.decode_www_form_component?

Tags:

ruby

encoding

These functions seem to do the same thing.

irb> CGI.unescape "Sloths%3A+Society+and+Habitat"
=> "Sloths: Society and Habitat"

irb> URI.decode_www_form_component "Sloths%3A+Society+and+Habitat"
=> "Sloths: Society and Habitat"

What's the difference?

like image 388
djb Avatar asked Sep 02 '25 06:09

djb


1 Answers

These methods are very similar. They both accept a string and an encoding and return a string in the specified encoding with the % escapes decoded. But there are differences:

Invalid escapes

URI.decode_www_form_component raises an ArgumentError if the string contains invalid escape sequences.

URI.decode_www_form_component('%xz')
# ArgumentError: invalid %-encoding (%xz)

CGI.unescape simply ignores them.

CGI.unescape('%xz')
# "%xz"

Invalid encodings

CGI.unescape ignores your specified encoding if the result is invalid

p CGI.unescape("\u263a", 'ASCII')
# "☺"

URI.decode_www_form_component doesn't care

p URI.decode_www_form_component("\u263a", 'ASCII')
# "\xE2\x98\xBA"

Lastly (and I hesitate to even mention this), URI.decode_www_form_component is slightly faster because it uses a precomputed Hash to decode all 485 valid escape codes (it's case-sensitive), whereas CGI.unescape actually interprets the hex code and repacks it as a character.

like image 175
Max Avatar answered Sep 04 '25 22:09

Max



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!