Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decide whether two LARGE unordered_map are equal?

Tags:

c++

stl

Given two large unordered_map, say map_a, map_b. How to effectively decide map_a has the same information as map_b? For example, if map_a is {'a':3, 'b':2} and map_b is {'a':3,'b':2} then they are the same. That is, for each key k in map_a, map_a[k]=map_b[k].

My question is how to decide this problem effectively. I know the worst time is O( max{map_a.size(), map_b.size()} ). But there are some observations which can quickly decide that map_a is not equivalent to map_b. For example, map_a.size()!=map_b.size().

Are there any other observations? Can we use the bucket_count() and bucket_size()?

W.l.o.g, let's suppose that map_a and map_b has the same hash function and (key,value) type.

like image 369
Peng Zhang Avatar asked Sep 16 '13 00:09

Peng Zhang


1 Answers

The problem is harder than it may seem, perhaps O( log( load_factor ) * size ), because the elements don't need to be in the same order in each map. (Hence unordered_map.) Each pair of corresponding buckets needs to be sorted (by hash value) before comparison.

According to 23.2.5/12,

For unordered_set and unordered_map, the complexity of operator== (i.e., the number of calls to the == operator of the value_type, to the predicate returned by key_equal(), and to the hasher returned by hash_function()) is proportional to N in the average case and to N2 in the worst case, where N is a.size(). For unordered_multiset and unordered_multimap, the complexity of operator== is proportional to ∑Ei2 in the average case and to N2 in the worst case, where N is a.size(), and Ei is the size of the ith equivalent-key group in a. However, if the respective elements of each corresponding pair of equivalent-key groups Eai and Ebi are arranged in the same order (as is commonly the case, e.g., if a and b are unmodified copies of the same container), then the average-case complexity for unordered_multiset and unordered_multimap becomes proportional to N (but worst-case complexity remains O(N2), e.g., for a pathologically bad hash function).

That's rather a lot to format properly for this site, but note that "N2" is supposed to be N2.

My log(load_factor) analysis may have been an oversimplification: I think the algorithm is actually required not to allocate memory. My recommendation is not to try this at home. You should rely on your standard library's implementation of operator ==, because it can rely on internal invariants that may not be guaranteed by the standard.

like image 83
Potatoswatter Avatar answered Oct 08 '22 19:10

Potatoswatter