What is the most efficient way to check if two hashes h1
and h2
have the same set of keys disregarding the order? Could it be made faster or more concise with close efficiency than the answer that I post?
Equality—Two hashes are equal if they each contain the same number of keys and if each key-value pair is equal to (according to Object#== ) the corresponding elements in the other hash. The orders of each hashes are not compared.
In Ruby, Hash is a collection of unique keys and their values. Hash is like an Array, except the indexing is done with the help of arbitrary keys of any object type. In Hash, the order of returning keys and their value by various iterators is arbitrary and will generally not be in the insertion order.
Combining freemasonjson's and sawa's ideas:
h1.size == h2.size and (h1.keys - h2.keys).empty?
Try:
# Check that both hash have the same number of entries first before anything
if h1.size == h2.size
# breaks from iteration and returns 'false' as soon as there is a mismatched key
# otherwise returns true
h1.keys.all?{ |key| !!h2[key] }
end
Enumerable#all?
worse case scenario, you'd only be iterating through the keys once.
Just for the sake of having at least a benchmark on this question...
require 'securerandom'
require 'benchmark'
a = {}
b = {}
# Use uuid to get a unique random key
(0..1_000).each do |i|
key = SecureRandom.uuid
a[key] = i
b[key] = i
end
Benchmark.bmbm do |x|
x.report("#-") do
1_000.times do
(a.keys - b.keys).empty? and (a.keys - b.keys).empty?
end
end
x.report("#&") do
1_000.times do
computed = a.keys & b.keys
computed.size == a.size
end
end
x.report("#all?") do
1_000.times do
a.keys.all?{ |key| !!b[key] }
end
end
x.report("#sort") do
1_000.times do
a_sorted = a.keys.sort
b_sorted = b.keys.sort
a == b
end
end
end
Results are:
Rehearsal -----------------------------------------
#- 1.000000 0.000000 1.000000 ( 1.001348)
#& 0.560000 0.000000 0.560000 ( 0.563523)
#all? 0.240000 0.000000 0.240000 ( 0.239058)
#sort 0.850000 0.010000 0.860000 ( 0.854839)
-------------------------------- total: 2.660000sec
user system total real
#- 0.980000 0.000000 0.980000 ( 0.976698)
#& 0.560000 0.000000 0.560000 ( 0.559592)
#all? 0.250000 0.000000 0.250000 ( 0.251128)
#sort 0.860000 0.000000 0.860000 ( 0.862857)
I have to agree with @akuhn that this would be a better benchmark if we had more information on the dataset you are using. But that being said, I believe this question really needed some hard fact.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With