I need to create a signature string for a variable in Ruby, where the variable can be a number, a string, a hash, or an array. The hash values and array elements can also be any of these types.
This string will be used to compare the values in a database (Mongo, in this case).
My first thought was to create an MD5 hash of a JSON encoded value, like so: (body is the variable referred to above)
def createsig(body)
Digest::MD5.hexdigest(JSON.generate(body))
end
This nearly works, but JSON.generate does not encode the keys of a hash in the same order each time, so createsig({:a=>'a',:b=>'b'})
does not always equal createsig({:b=>'b',:a=>'a'})
.
What is the best way to create a signature string to fit this need?
Note: For the detail oriented among us, I know that you can't JSON.generate()
a number or a string. In these cases, I would just call MD5.hexdigest()
directly.
Call MessageDigest. getInstance("MD5") to get a MD5 instance of MessageDigest you can use. The compute the hash by doing one of: Feed the entire input as a byte[] and calculate the hash in one operation with md.
Generally, two files can have the same md5 hash only if their contents are exactly the same. Even a single bit of variation will generate a completely different hash value. There is one caveat, though: An md5 sum is 128 bits (16 bytes).
You can check using the following function: function isValidMd5($md5 ='') { return preg_match('/^[a-f0-9]{32}$/', $md5); } echo isValidMd5('5d41402abc4b2a76b9719d911017c592'); The MD5 (Message-digest algorithm) Hash is typically expressed in text format as a 32 digit hexadecimal number.
I coding up the following pretty quickly and don't have time to really test it here at work, but it ought to do the job. Let me know if you find any issues with it and I'll take a look.
This should properly flatten out and sort the arrays and hashes, and you'd need to have to some pretty strange looking strings for there to be any collisions.
def createsig(body)
Digest::MD5.hexdigest( sigflat body )
end
def sigflat(body)
if body.class == Hash
arr = []
body.each do |key, value|
arr << "#{sigflat key}=>#{sigflat value}"
end
body = arr
end
if body.class == Array
str = ''
body.map! do |value|
sigflat value
end.sort!.each do |value|
str << value
end
end
if body.class != String
body = body.to_s << body.class.to_s
end
body
end
> sigflat({:a => {:b => 'b', :c => 'c'}, :d => 'd'}) == sigflat({:d => 'd', :a => {:c => 'c', :b => 'b'}})
=> true
If you could only get a string representation of body
and not have the Ruby 1.8 hash come back with different orders from one time to the other, you could reliably hash that string representation. Let's get our hands dirty with some monkey patches:
require 'digest/md5'
class Object
def md5key
to_s
end
end
class Array
def md5key
map(&:md5key).join
end
end
class Hash
def md5key
sort.map(&:md5key).join
end
end
Now any object (of the types mentioned in the question) respond to md5key
by returning a reliable key to use for creating a checksum, so:
def createsig(o)
Digest::MD5.hexdigest(o.md5key)
end
Example:
body = [
{
'bar' => [
345,
"baz",
],
'qux' => 7,
},
"foo",
123,
]
p body.md5key # => "bar345bazqux7foo123"
p createsig(body) # => "3a92036374de88118faf19483fe2572e"
Note: This hash representation does not encode the structure, only the concatenation of the values. Therefore ["a", "b", "c"] will hash the same as ["abc"].
Here's my solution. I walk the data structure and build up a list of pieces that get joined into a single string. In order to ensure that the class types seen affect the hash, I inject a single unicode character that encodes basic type information along the way. (For example, we want ["1", "2", "3"].objsum != [1,2,3].objsum)
I did this as a refinement on Object, it's easily ported to a monkey patch. To use it just require the file and run "using ObjSum".
module ObjSum
refine Object do
def objsum
parts = []
queue = [self]
while queue.size > 0
item = queue.shift
if item.kind_of?(Hash)
parts << "\\000"
item.keys.sort.each do |k|
queue << k
queue << item[k]
end
elsif item.kind_of?(Set)
parts << "\\001"
item.to_a.sort.each { |i| queue << i }
elsif item.kind_of?(Enumerable)
parts << "\\002"
item.each { |i| queue << i }
elsif item.kind_of?(Fixnum)
parts << "\\003"
parts << item.to_s
elsif item.kind_of?(Float)
parts << "\\004"
parts << item.to_s
else
parts << item.to_s
end
end
Digest::MD5.hexdigest(parts.join)
end
end
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With