Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating an md5 hash of a number, string, array, or hash in Ruby

Tags:

ruby

md5

digest

I need to create a signature string for a variable in Ruby, where the variable can be a number, a string, a hash, or an array. The hash values and array elements can also be any of these types.

This string will be used to compare the values in a database (Mongo, in this case).

My first thought was to create an MD5 hash of a JSON encoded value, like so: (body is the variable referred to above)

def createsig(body)    
  Digest::MD5.hexdigest(JSON.generate(body))
end

This nearly works, but JSON.generate does not encode the keys of a hash in the same order each time, so createsig({:a=>'a',:b=>'b'}) does not always equal createsig({:b=>'b',:a=>'a'}).

What is the best way to create a signature string to fit this need?

Note: For the detail oriented among us, I know that you can't JSON.generate() a number or a string. In these cases, I would just call MD5.hexdigest() directly.

like image 886
TelegramSam Avatar asked Jun 23 '11 23:06

TelegramSam


People also ask

How do you generate md5 hash of a string?

Call MessageDigest. getInstance("MD5") to get a MD5 instance of MessageDigest you can use. The compute the hash by doing one of: Feed the entire input as a byte[] and calculate the hash in one operation with md.

Can two strings have same md5?

Generally, two files can have the same md5 hash only if their contents are exactly the same. Even a single bit of variation will generate a completely different hash value. There is one caveat, though: An md5 sum is 128 bits (16 bytes).

How do I check if a string is an md5?

You can check using the following function: function isValidMd5($md5 ='') { return preg_match('/^[a-f0-9]{32}$/', $md5); } echo isValidMd5('5d41402abc4b2a76b9719d911017c592'); The MD5 (Message-digest algorithm) Hash is typically expressed in text format as a 32 digit hexadecimal number.


3 Answers

I coding up the following pretty quickly and don't have time to really test it here at work, but it ought to do the job. Let me know if you find any issues with it and I'll take a look.

This should properly flatten out and sort the arrays and hashes, and you'd need to have to some pretty strange looking strings for there to be any collisions.

def createsig(body)
  Digest::MD5.hexdigest( sigflat body )
end

def sigflat(body)
  if body.class == Hash
    arr = []
    body.each do |key, value|
      arr << "#{sigflat key}=>#{sigflat value}"
    end
    body = arr
  end
  if body.class == Array
    str = ''
    body.map! do |value|
      sigflat value
    end.sort!.each do |value|
      str << value
    end
  end
  if body.class != String
    body = body.to_s << body.class.to_s
  end
  body
end

> sigflat({:a => {:b => 'b', :c => 'c'}, :d => 'd'}) == sigflat({:d => 'd', :a => {:c => 'c', :b => 'b'}})
=> true
like image 171
Luke Avatar answered Oct 26 '22 00:10

Luke


If you could only get a string representation of body and not have the Ruby 1.8 hash come back with different orders from one time to the other, you could reliably hash that string representation. Let's get our hands dirty with some monkey patches:

require 'digest/md5'

class Object
  def md5key
    to_s
  end
end

class Array
  def md5key
    map(&:md5key).join
  end
end

class Hash
  def md5key
    sort.map(&:md5key).join
  end
end

Now any object (of the types mentioned in the question) respond to md5key by returning a reliable key to use for creating a checksum, so:

def createsig(o)
  Digest::MD5.hexdigest(o.md5key)
end

Example:

body = [
  {
    'bar' => [
      345,
      "baz",
    ],
    'qux' => 7,
  },
  "foo",
  123,
]
p body.md5key        # => "bar345bazqux7foo123"
p createsig(body)    # => "3a92036374de88118faf19483fe2572e"

Note: This hash representation does not encode the structure, only the concatenation of the values. Therefore ["a", "b", "c"] will hash the same as ["abc"].

like image 15
Wayne Conrad Avatar answered Oct 25 '22 23:10

Wayne Conrad


Here's my solution. I walk the data structure and build up a list of pieces that get joined into a single string. In order to ensure that the class types seen affect the hash, I inject a single unicode character that encodes basic type information along the way. (For example, we want ["1", "2", "3"].objsum != [1,2,3].objsum)

I did this as a refinement on Object, it's easily ported to a monkey patch. To use it just require the file and run "using ObjSum".

module ObjSum
  refine Object do
    def objsum
      parts = []
      queue = [self]

      while queue.size > 0
        item = queue.shift

        if item.kind_of?(Hash)
          parts << "\\000"
          item.keys.sort.each do |k| 
            queue << k
            queue << item[k]
          end
        elsif item.kind_of?(Set)
          parts << "\\001"
          item.to_a.sort.each { |i| queue << i }
        elsif item.kind_of?(Enumerable)
          parts << "\\002"
          item.each { |i| queue << i }
        elsif item.kind_of?(Fixnum)
          parts << "\\003"
          parts << item.to_s
        elsif item.kind_of?(Float)
          parts << "\\004"
          parts << item.to_s
        else
          parts << item.to_s
        end
      end

      Digest::MD5.hexdigest(parts.join)
    end
  end
end
like image 1
Greg Fodor Avatar answered Oct 26 '22 00:10

Greg Fodor