C++ some questions on boost::unordered_map & boost::hash

Tags:

I've only recently started dwelling into boost and it's containers, and I read a few articles on the web and on stackoverflow that a boost::unordered_map is the fastest performing container for big collections. So, I have this class State, which must be unique in the container (no duplicates) and there will be millions if not billions of states in the container. Therefore I have been trying to optimize it for small size and as few computations as possible. I was using a boost::ptr_vector before, but as I read on stackoverflow a vector is only good as long as there are not that many objects in it. In my case, the State descibes sensorimotor information from a robot, so there can be an enormous amount of states, and therefore fast lookup is of topemost priority. Following the boost documentation for unordered_map I realize that there are two things I could do to speed things up: use a hash_function, and use an equality operator to compare States based on their hash_function. So, I implemented a private hash() function which takes in State information and using boost::hash_combine, creates an std::size_t hash value. The operator== compares basically the state's hash values. So:

is std::size_t enough to cover billions of possible hash_function combinations ? In order to avoid duplicate states I intend to use their hash_values.
When creating a state_map, should I use as key the State* or the hash value ? i.e: boost::unordered_map<State*,std::size_t> state_map; Or boost::unordered_map<std::size_t,State*> state_map;
Are the lookup times with a boost::unordered_map::iterator = state_map.find() faster than going through a boost::ptr_vector and comparing each iterator's key value ?
Finally, any tips or tricks on how to optimize such an unordered map for speed and fast lookups would be greatly appreciated.

EDIT: I have seen quite a few answers, one being not to use boost but C++0X, another not to use an unordered_set, but to be honest, I still want to see how boost::unordered_set is used with a hash function. I have followed boost's documentation and implemented, but I still cannot figure out how to use the hash function of boost with the ordered set.

652

asked Jul 14 '11 00:07

Ælex

2 Answers

This is a bit muddled.

What you say are not "things that you can do to speed things up"; rather, they are mandatory requirements of your type to be eligible as the element type of an unordered map, and also for an unordered set (which you might rather want).
You need to provide an equality operator that compares objects, not hash values. The whole point of the equality is to distinguish elements with the same hash.
size_t is an unsigned integral type, 32 bits on x86 and 64 bits on x64. Since you want "billions of elements", which means many gigabytes of data, I assume you have a solid x64 machine anyway.
What's crucial is that your hash function is good, i.e. has few collisions.
You want a set, not a map. Put the objects directly in the set: std::unordered_set<State>. Use a map if you are mapping to something, i.e. states to something else. Oh, use C++0x, not boost, if you can.
Using hash_combine is good.

Baby example:

struct State
{
  inline bool operator==(const State &) const;
  /* Stuff */
};

namespace std
{
  template <> struct hash<State>
  {
    inline std::size_t operator()(const State & s) const
    {
      /* your hash algorithm here */
    }
  };
}

std::size_t Foo(const State & s) { /* some code */ }

int main()
{
  std::unordered_set<State> states; // no extra data needed
  std::unordered_set<State, Foo> states; // another hash function
}

115

answered Sep 25 '22 09:09

Kerrek SB

An unordered_map is a hashtable. You don't store the hash; it is done internally as the storage and lookup method.

Given your requirements, an unordered_set might be more appropriate, since your object is the only item to store.

You are a little confused though -- the equality operator and hash function are not truly performance items, but required for nontrivial objects for the container to work correctly. A good hash function will distribute your nodes evenly across the buckets, and the equality operator will be used to remove any ambiguity about matches based on the hash function.

std::size_t is fine for the hash function. Remember that no hash is perfect; there will be collisions, and these collision items are stored in a linked list at that bucket position.

Thus, .find() will be O(1) in the optimal case and very close to O(1) in the average case (and O(N) in the worst case, but a decent hash function will avoid that.)

You don't mention your platform or architecture; at billions of entries you still might have to worry about out-of-memory situations depending on those and the size of your State object.

answered Sep 26 '22 09:09

Joe

Related questions
                            
                                Why can C++ functors be preferable to objects with named methods?
                            
                                arrays in C++: can you explain what this line of code is doing?
                            
                                Taking a JPEG-encoded screenshot to a buffer using GDI+ and C++
                            
                                Indexing Huge text file
                            
                                How to Determine if STDIN is Empty?
                            
                                Return Function Pointer to Self?
                            
                                Can I create a ActiveX Control from a WPF UserControl and use in a non CLI C++ application?
                            
                                Methods overloading while inheritance in C++
                            
                                Destructor Called Twice While No Copy Constructor or Assignment Operator Gets Callled
                            
                                How can I use std::hex for my custom uint128 type?
                            
                                How to directly map a Lua variable to a C++ variable?
                            
                                C++ overload resolution
                            
                                C++11 Lambda Functions inside member methods inherit scope
                            
                                VS2010 - Structure change in CryptoAPI - v7.0A Vs v6.0A - WinCrypt.h
                            
                                Uncatchable custom exception C++
                            
                                Check if display is turned off by windows power management
                            
                                Random element in STL set/map in log n
                            
                                C++ Non copyable except sometimes
                            
                                std::set::insert, how bad can I hint?
                            
                                error C2143 : missing ';' before '*'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C++ some questions on boost::unordered_map & boost::hash

Tags:

c++

hash

unordered-map

boost-unordered

Ælex

People also ask

2 Answers

Kerrek SB

Joe

Recent Activity

Donate For Us