What's the difference between CGI.unescape and URI.decode_www_form_component?

Question

These functions seem to do the same thing.

irb> CGI.unescape "Sloths%3A+Society+and+Habitat"
=> "Sloths: Society and Habitat"

irb> URI.decode_www_form_component "Sloths%3A+Society+and+Habitat"
=> "Sloths: Society and Habitat"

What's the difference?

Max · Accepted Answer

These methods are very similar. They both accept a string and an encoding and return a string in the specified encoding with the % escapes decoded. But there are differences:

Invalid escapes

URI.decode_www_form_component raises an ArgumentError if the string contains invalid escape sequences.

URI.decode_www_form_component('%xz')
# ArgumentError: invalid %-encoding (%xz)

CGI.unescape simply ignores them.

CGI.unescape('%xz')
# "%xz"

Invalid encodings

CGI.unescape ignores your specified encoding if the result is invalid

p CGI.unescape("\u263a", 'ASCII')
# "☺"

URI.decode_www_form_component doesn't care

p URI.decode_www_form_component("\u263a", 'ASCII')
# "\xE2\x98\xBA"

Lastly (and I hesitate to even mention this), URI.decode_www_form_component is slightly faster because it uses a precomputed Hash to decode all 485 valid escape codes (it's case-sensitive), whereas CGI.unescape actually interprets the hex code and repacks it as a character.

What's the difference between CGI.unescape and URI.decode_www_form_component?

Tags:

ruby

encoding

djb

1 Answers

Invalid escapes

Invalid encodings

Max

Recent Activity

Donate For Us

What's the difference between CGI.unescape and URI.decode_www_form_component?

Tags:

ruby

encoding

djb

1 Answers

Invalid escapes

Invalid encodings

Max

Related questions

Recent Activity

Donate For Us