Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating unordered_set of unordered_set

Tags:

c++

c++11

hash

I want to create a container that will store unique sets of integers inside.

I want to create something similar to

std::unordered_set<std::unordered_set<unsigned int>>

But g++ does not let me do that and says:

invalid use of incomplete type 'struct std::hash<std::unordered_set<unsigned int> >'

What I want to achieve is to have unique sets of unsigned ints.

How can I do that?

like image 800
peku33 Avatar asked Jan 03 '15 16:01

peku33


3 Answers

I'm adding yet another answer to this question as currently no one has touched upon a key point.

Everyone is telling you that you need to create a hash function for unordered_set<unsigned>, and this is correct. You can do so by specializing std::hash<unordered_set<unsigned>>, or you can create your own functor and use it like this:

unordered_set<unordered_set<unsigned>, my_unordered_set_hash_functor> s;

Either way is fine. However there is a big problem you need to watch out for:

For any two unordered_set<unsigned> that compare equal (x == y), they must hash to the same value: hash(x) == hash(y). If you fail to follow this rule, you will get run time errors. Also note that the following two unordered_sets compare equal (using pseudo code here for clarity):

{1, 2, 3} == {3, 2, 1}

Therefore hash({1, 2, 3}) must equal hash({3, 2, 1}). Said differently, the unordered containers have an equality operator where order does not matter. So however you construct your hash function, its result must be independent of the order of the elements in the container.

Alternatively you can replace the equality predicate used in the unordered_set such that it does respect order:

unordered_set<unordered_set<unsigned>, my_unordered_set_hash_functor,
                                       my_unordered_equal> s;

The burden of getting all of this right, makes:

unodered_set<set<unsigned>, my_set_hash_functor>

look fairly attractive. You still have to create a hash functor for set<unsigned>, but now you don't have to worry about getting the same hash code for {1, 2, 3} and {3, 2, 1}. Instead you have to make sure these hash codes are different.

I note that Walter's answer gives a hash functor that has the right behavior: it ignores order in computing the hash code. But then his answer (currently) tells you that this is not a good solution. :-) It actually is a good solution for unordered containers. An even better solution would be to return the sum of the individual hashes instead of hashing the sum of the elements.

like image 132
Howard Hinnant Avatar answered Oct 22 '22 05:10

Howard Hinnant


You can do this, but like every unsorted_set/map element type the inner unsorted_set now needs a Hash function to be defined. It does not have one by default but you can write one yourself.

like image 4
Lightness Races in Orbit Avatar answered Oct 22 '22 07:10

Lightness Races in Orbit


What you have to do is to define an appropriate hash for keys of type std::unordered_set<unsigned int> (since operator== is already defined for this key, you will not need to also provide the EqualKey template parameter for std::unordered_set<std::unordered_set<unsigned int>, Hash, EqualKey>.

One simple (albeit inefficient) option is to hash on the total sum of all elements of the set. This would look similar to this:

template<typename T>
struct hash_on_sum
: private std::hash<typename T::element_type>
{
  typedef T::element_type count_type;
  typedef std::hash<count_type> base;
  std::size_t operator()(T const&obj) const
  {
    return base::operator()(std::accumulate(obj.begin(),obj.end(),count_type()));
  }
};

typedef std::unordered_set<unsigned int> inner_type;
typedef std::unordered_set<inner_type, hash_on_sum<inner_type>> set_of_unique_sets;

However, while simple, this is not good, since it does not guarantee the following requirement. For two different parameters k1 and k2 that are not equal, the probability that std::hash<Key>()(k1) == std::hash<Key>()(k2) should be very small, approaching 1.0/std::numeric_limits<size_t>::max().

like image 3
Walter Avatar answered Oct 22 '22 06:10

Walter