I have a data structure (Hash
) that looks something like this:
{
foo: "Test string",
bar: [475934759, 5619827847]
}
I'm trying to create a checksum from that Hash
to check for equality in the future. I tried using the hash
method of the Hash
, which resulted in a satisfyingly nice-looking hash, but it turns out that the same Hash
will produce a different hash after the interpreter has been restarted.
I really just want to be able to create a ~128 bit checksum from a Hash
, String
or Array
instance.
Is this possible?
You could calculate your own hash based on the object's Marshal dump or JSON representation.
This calculates the MD5 hash of a Marshal dump:
require 'digest/md5'
hash = {
foo: "Test string",
bar: [475934759, 5619827847]
}
Marshal::dump(hash)
#=> "\x04\b{\a:\bfooI\"\x10Test string\x06:\x06ET:\bbar[\ai\x04'0^\x1Cl+\b\x87\xC4\xF7N\x01\x00"
Digest::MD5.hexdigest(Marshal::dump(hash))
#=> "1b6308abdd8f5f6290e2825a078a1a02"
Update
You can implement your own strategy, although I would not recommend to change core functionality:
class Hash
def _dump(depth)
# this doesn't cause a recursion because sort returns an array
Marshal::dump(self.sort, depth)
end
def self._load(marshaled_hash)
Hash[Marshal::load(marshaled_hash)]
end
end
Marshal::dump({foo:1, bar:2})
#=> "\x04\bu:\tHash\e\x04\b[\a[\a:\bbari\a[\a:\bfooi\x06"
Marshal::dump({bar:2, foo:1})
#=> "\x04\bu:\tHash\e\x04\b[\a[\a:\bbari\a[\a:\bfooi\x06"
Marshal::load(Marshal::dump({foo:1, bar:2}))
#=> {:bar=>2, :foo=>1}
To build on @Stefan's answer above, if order of the hash is important, sort the output before pushing it through Mashall.
require 'digest/md5'
hash = {
'foo'=> "Test string",
'bar'=> [475934759, 5619827847]
}
puts Digest::MD5.hexdigest(Marshal::dump(hash.collect{|k,v| [k,v]}.sort{|a,b| a[0] <=> b[0]}))
# 8509c564c0ae8dcb6c2b9b564ba6a03f
hash = {
'bar'=> [475934759, 5619827847],
'foo'=> "Test string"
}
puts Digest::MD5.hexdigest(Marshal::dump(hash.collect{|k,v| [k,v]}.sort{|a,b| a[0] <=> b[0]}))
# 8509c564c0ae8dcb6c2b9b564ba6a03f
If you need to generate the checksum for the content of the hash, whatever the order of the data, using Marshal or sort or other techniques won't work.
The only solid way I found so far is the following:
require 'digest/md5'
hash1 = { "a" => 1, "b" => "2", c: { d: "3" } }
hash2 = { c: { d: "3" }, "a" => 1, "b" => "2" }
Digest::MD5.hexdigest(Marshal.dump(hash1)) # => "5def3b2cbdddd3aa6730b6d0527c2d79"
Digest::MD5.hexdigest(Marshal.dump(hash2)) # => "8155698ccfb05b8db01490e9b9634fd9"
Digest::MD5.hexdigest(hash1.to_s.chars.sort.join) # => "812bb65d65380fc1e620a9596806cc35"
Digest::MD5.hexdigest(hash2.to_s.chars.sort.join) # => "812bb65d65380fc1e620a9596806cc35"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With