Given two large unordered_map, say map_a, map_b. How to effectively decide map_a has the same information as map_b?
For example, if map_a is {'a':3, 'b':2}
and map_b is {'a':3,'b':2}
then they are the same. That is, for each key k in map_a, map_a[k]=map_b[k].
My question is how to decide this problem effectively. I know the worst time is O( max{map_a.size(), map_b.size()} )
. But there are some observations which can quickly decide that map_a is not equivalent to map_b. For example, map_a.size()!=map_b.size().
Are there any other observations? Can we use the bucket_count() and bucket_size()?
W.l.o.g, let's suppose that map_a and map_b has the same hash function and (key,value) type.
The problem is harder than it may seem, perhaps O( log( load_factor ) * size ), because the elements don't need to be in the same order in each map. (Hence unordered_map
.) Each pair of corresponding buckets needs to be sorted (by hash value) before comparison.
According to 23.2.5/12,
For unordered_set and unordered_map, the complexity of operator== (i.e., the number of calls to the == operator of the value_type, to the predicate returned by key_equal(), and to the hasher returned by hash_function()) is proportional to N in the average case and to N2 in the worst case, where N is a.size(). For unordered_multiset and unordered_multimap, the complexity of operator== is proportional to ∑Ei2 in the average case and to N2 in the worst case, where N is a.size(), and Ei is the size of the ith equivalent-key group in a. However, if the respective elements of each corresponding pair of equivalent-key groups Eai and Ebi are arranged in the same order (as is commonly the case, e.g., if a and b are unmodified copies of the same container), then the average-case complexity for unordered_multiset and unordered_multimap becomes proportional to N (but worst-case complexity remains O(N2), e.g., for a pathologically bad hash function).
That's rather a lot to format properly for this site, but note that "N2" is supposed to be N2.
My log(load_factor) analysis may have been an oversimplification: I think the algorithm is actually required not to allocate memory. My recommendation is not to try this at home. You should rely on your standard library's implementation of operator ==
, because it can rely on internal invariants that may not be guaranteed by the standard.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With