I have some vector of integer that I would like to store efficiently in a unordered_map in c++11 my question is this:
How do I best store these and optimize for .find
queries?
I came up with the following hasher:
class uint32_vector_hasher { public: std::size_t operator()(std::vector<uint32_t> const& vec) const { std::size_t ret = 0; for(auto& i : vec) { ret ^= std::hash<uint32_t>()(i); } return ret; } };
and then store the objects in an unordered_map
I do however have a couple of questions
==
and hash functions to make memorize the hash and avoid it being calculated more than once?When profiling I've noticed that a rather large amount of my cpu time is spend doing lookups on the unordered maps, this is not exactly optimal :(
Choosing a good hashing function, h(k), is essential for hash-table based searching. h should distribute the elements of our collection as uniformly as possible to the "slots" of the hash table. The key criterion is that there should be a minimum number of collisions. will provide uniform hashing.
Unlike a set of vectors, vectors are not arranged in any particular order in an unordered set of vectors. By default, C++ doesn't provide us the facility to create an unordered set of vectors. We are required to pass a hash function using which one can easily create an unordered set of vectors.
In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix.
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table.
So, when not wanting to use boost, Michael Blurr's comment led to the following hash function implementation:
std::size_t operator()(std::vector<uint32_t> const& vec) const { std::size_t seed = vec.size(); for(auto& i : vec) { seed ^= i + 0x9e3779b9 + (seed << 6) + (seed >> 2); } return seed; }
Seems to work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With