Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A good hash function for a vector

Tags:

c++

c++11

hash

I have some vector of integer that I would like to store efficiently in a unordered_map in c++11 my question is this:

How do I best store these and optimize for .find queries?

I came up with the following hasher:

class uint32_vector_hasher { public:   std::size_t operator()(std::vector<uint32_t> const& vec) const {     std::size_t ret = 0;     for(auto& i : vec) {       ret ^= std::hash<uint32_t>()(i);     }     return ret;   } }; 

and then store the objects in an unordered_map I do however have a couple of questions

  1. how often does the hash get calculated, only one, some random number or times?
  2. Would it make sense to create a wrapper object with == and hash functions to make memorize the hash and avoid it being calculated more than once?

When profiling I've noticed that a rather large amount of my cpu time is spend doing lookups on the unordered maps, this is not exactly optimal :(

like image 460
Martin Kristiansen Avatar asked Dec 11 '13 05:12

Martin Kristiansen


People also ask

What is a good choice for a hash function?

Choosing a good hashing function, h(k), is essential for hash-table based searching. h should distribute the elements of our collection as uniformly as possible to the "slots" of the hash table. The key criterion is that there should be a minimum number of collisions. will provide uniform hashing.

Can I hash a vector in C++?

Unlike a set of vectors, vectors are not arranged in any particular order in an unordered set of vectors. By default, C++ doesn't provide us the facility to create an unordered set of vectors. We are required to pass a hash function using which one can easily create an unordered set of vectors.

What is a hash vector?

In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix.

What is the purpose of using a function to hash vectors into values?

A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table.


1 Answers

So, when not wanting to use boost, Michael Blurr's comment led to the following hash function implementation:

std::size_t operator()(std::vector<uint32_t> const& vec) const {   std::size_t seed = vec.size();   for(auto& i : vec) {     seed ^= i + 0x9e3779b9 + (seed << 6) + (seed >> 2);   }   return seed; } 

Seems to work.

like image 103
HolKann Avatar answered Sep 30 '22 18:09

HolKann