Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a quick and easy way to create a checksum from Ruby's basic data structures?

I have a data structure (Hash) that looks something like this:

{
    foo: "Test string",
    bar: [475934759, 5619827847]
}

I'm trying to create a checksum from that Hash to check for equality in the future. I tried using the hash method of the Hash, which resulted in a satisfyingly nice-looking hash, but it turns out that the same Hash will produce a different hash after the interpreter has been restarted.

I really just want to be able to create a ~128 bit checksum from a Hash, String or Array instance.

Is this possible?

like image 291
Hubro Avatar asked Oct 15 '13 11:10

Hubro


3 Answers

You could calculate your own hash based on the object's Marshal dump or JSON representation.

This calculates the MD5 hash of a Marshal dump:

require 'digest/md5'

hash = {
  foo: "Test string",
  bar: [475934759, 5619827847]
}

Marshal::dump(hash)
#=> "\x04\b{\a:\bfooI\"\x10Test string\x06:\x06ET:\bbar[\ai\x04'0^\x1Cl+\b\x87\xC4\xF7N\x01\x00"

Digest::MD5.hexdigest(Marshal::dump(hash))
#=> "1b6308abdd8f5f6290e2825a078a1a02"

Update

You can implement your own strategy, although I would not recommend to change core functionality:

class Hash
  def _dump(depth)
    # this doesn't cause a recursion because sort returns an array
    Marshal::dump(self.sort, depth)
  end

  def self._load(marshaled_hash)
    Hash[Marshal::load(marshaled_hash)]
  end
end

Marshal::dump({foo:1, bar:2})
#=> "\x04\bu:\tHash\e\x04\b[\a[\a:\bbari\a[\a:\bfooi\x06"

Marshal::dump({bar:2, foo:1})
#=> "\x04\bu:\tHash\e\x04\b[\a[\a:\bbari\a[\a:\bfooi\x06"

Marshal::load(Marshal::dump({foo:1, bar:2}))
#=> {:bar=>2, :foo=>1}
like image 180
Stefan Avatar answered Nov 12 '22 14:11

Stefan


To build on @Stefan's answer above, if order of the hash is important, sort the output before pushing it through Mashall.

require 'digest/md5'

hash = {
  'foo'=> "Test string",
  'bar'=> [475934759, 5619827847]
}

puts Digest::MD5.hexdigest(Marshal::dump(hash.collect{|k,v| [k,v]}.sort{|a,b| a[0] <=> b[0]})) 
# 8509c564c0ae8dcb6c2b9b564ba6a03f

hash = {
  'bar'=> [475934759, 5619827847],
  'foo'=> "Test string"
}

puts Digest::MD5.hexdigest(Marshal::dump(hash.collect{|k,v| [k,v]}.sort{|a,b| a[0] <=> b[0]})) 
# 8509c564c0ae8dcb6c2b9b564ba6a03f 
like image 42
mcfinnigan Avatar answered Nov 12 '22 13:11

mcfinnigan


If you need to generate the checksum for the content of the hash, whatever the order of the data, using Marshal or sort or other techniques won't work.

The only solid way I found so far is the following:

require 'digest/md5'

hash1 = { "a" => 1, "b" => "2", c: { d: "3" } }
hash2 = { c: { d: "3" }, "a" => 1, "b" => "2" }

Digest::MD5.hexdigest(Marshal.dump(hash1)) # => "5def3b2cbdddd3aa6730b6d0527c2d79"
Digest::MD5.hexdigest(Marshal.dump(hash2)) # => "8155698ccfb05b8db01490e9b9634fd9"

Digest::MD5.hexdigest(hash1.to_s.chars.sort.join) # => "812bb65d65380fc1e620a9596806cc35"
Digest::MD5.hexdigest(hash2.to_s.chars.sort.join) # => "812bb65d65380fc1e620a9596806cc35"
like image 1
Happynoff Avatar answered Nov 12 '22 12:11

Happynoff