Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby Marshal.dump gives different results for what looks like the same thing

I'm seeing slightly different results with Ruby's Marshal.dump depending on if I called .to_s on something or I typed in the characters. I'm really not clear on what's happening here:

»  Marshal.dump(1.to_s)
=> "\x04\bI\"\x061\x06:\x06EF"
»  Marshal.dump('1')
=> "\x04\bI\"\x061\x06:\x06ET"
»  1.to_s == '1'
=> true

So although it appears that 1.to_s == '1', they don't dump out to the same thing, but the only difference is in the very last byte. Any ideas why this is happening and how I can get both things to dump to the same byte sequence?

like image 767
G Gordon Worley III Avatar asked Oct 12 '25 09:10

G Gordon Worley III


1 Answers

Marshal.load("\x04\bI\"\x061\x06:\x06EF").encoding
# => #<Encoding:US-ASCII> 
Marshal.load("\x04\bI\"\x061\x06:\x06ET").encoding
# => #<Encoding:UTF-8>

By default, 1.to_s.encoding is not the same as '1'.encoding. However, both strings are in 7-bit ASCII range, so they are comparable, and '1' == 1.to_s will be able to give you the result true, after some internal magic. But they are not the same thing.

Marshal.dump(1.to_s.force_encoding('utf-8'))
# => "\x04\bI\"\x061\x06:\x06ET"
Marshal.dump('1')
# => "\x04\bI\"\x061\x06:\x06ET"

(Assuming you run it on a newer Ruby, and don't mess with source encoding.)

like image 133
Amadan Avatar answered Oct 15 '25 00:10

Amadan