I've doing some performance analysis on the software I develop, and I've found that lookups on a global dictionary of URL's takes about 10% of the application's "load" phase time. The dictionary is implemented as a C++ STL std::map, which has O(lg n) lookups. I'm going to move it to a hash_map, which has roughly fixed time lookups. The stl string class doesn't have a hash code property, and it certainly doesn't cache a hash code. That means that each lookup requires re-generating the hash code. I'm skeptical that caching the hash code is worth the effort. It would mean changing many lines of code to use a new string class with a cached hash code property. Given that the current implementation does log(n) full string comparisons on every lookup, I think reducing it to basically one string traversal (by the hash function) per lookup is a big win. Does anyone have experience with caching string hash codes? Has it ever proven worth the effort?

I don't have experience with caching hash codes, but I've done some work recently converting <code>std::map</code> to <code>std::tr1::unordered_map</code>. Two thoughts come to mind. First, try profiling that relatively simple change first, because it sometimes makes things worse, depending on what your code is doing. It might give you enough speedup on its own before you try optimizing further. Secondly, what does your profiler say about the other 90% of your initialization time? Even if you optimized the global dictionary stuff down to 0 time, you will at most improve performance by 10%.

One word of warning. While a hash map can have fixed time lookups, it also can end up having O(N) lookups. While it's not a common case, it does happen. So while you always have to pay for the O(log N) time in a map, you are also guaranteed that it will not be worse.

When you compare the hash map to the map, also try a Trie, or related data structure (whatever you can get off the shelf): Trie implementation Unfortunately you may then spend a lot of time worrying about cache-friendliness. In that respect a Trie is similar to the tree you already have, and a hash map will probably be better-behaved than a naively-allocated tree. Also, I'm a little confused by the question. If you're looking up the same string object multiple times, such that caching its hash value is worthwhile, shouldn't you just be caching the result of the lookup? The whole point of a hash table is that different objects which are value-equal hash to the same value. If you aren't computing the same hash several times from distinct strings containing the same characters, then your hash table probably isn't doing its job. If you mean caching the values of the keys already in the hash table, that's up to the hash table.

Should I cache the hash code of an STL string used as a hash key?

Tags:

c++

hashtable

caching

hash

stl

I've doing some performance analysis on the software I develop, and I've found that lookups on a global dictionary of URL's takes about 10% of the application's "load" phase time. The dictionary is implemented as a C++ STL std::map, which has O(lg n) lookups. I'm going to move it to a hash_map, which has roughly fixed time lookups. The stl string class doesn't have a hash code property, and it certainly doesn't cache a hash code. That means that each lookup requires re-generating the hash code.

I'm skeptical that caching the hash code is worth the effort. It would mean changing many lines of code to use a new string class with a cached hash code property. Given that the current implementation does log(n) full string comparisons on every lookup, I think reducing it to basically one string traversal (by the hash function) per lookup is a big win.

Does anyone have experience with caching string hash codes? Has it ever proven worth the effort?

540

asked Feb 03 '10 19:02

David Gladfelter

3 Answers

I don't have experience with caching hash codes, but I've done some work recently converting std::map to std::tr1::unordered_map. Two thoughts come to mind. First, try profiling that relatively simple change first, because it sometimes makes things worse, depending on what your code is doing. It might give you enough speedup on its own before you try optimizing further. Secondly, what does your profiler say about the other 90% of your initialization time? Even if you optimized the global dictionary stuff down to 0 time, you will at most improve performance by 10%.

187

answered Sep 24 '22 08:09

Michael Kristofik

One word of warning.

While a hash map can have fixed time lookups, it also can end up having O(N) lookups. While it's not a common case, it does happen.

So while you always have to pay for the O(log N) time in a map, you are also guaranteed that it will not be worse.

answered Sep 24 '22 08:09

R Samuel Klatchko

When you compare the hash map to the map, also try a Trie, or related data structure (whatever you can get off the shelf):

Trie implementation

Unfortunately you may then spend a lot of time worrying about cache-friendliness. In that respect a Trie is similar to the tree you already have, and a hash map will probably be better-behaved than a naively-allocated tree.

Also, I'm a little confused by the question. If you're looking up the same string object multiple times, such that caching its hash value is worthwhile, shouldn't you just be caching the result of the lookup? The whole point of a hash table is that different objects which are value-equal hash to the same value. If you aren't computing the same hash several times from distinct strings containing the same characters, then your hash table probably isn't doing its job.

If you mean caching the values of the keys already in the hash table, that's up to the hash table.

answered Sep 24 '22 08:09

Steve Jessop

Related questions
                            
                                Why is it so convoluted to get the date and/or time in C++?
                            
                                Why is unique_ptr::release not defined with [[nodiscard]]?
                            
                                Why is it illegal to bind an r-value to a const l-value reference in special member functions?
                            
                                How to use clang-10 or gcc-10 when building via Github Actions?
                            
                                Borland x86 inlined assembler; get a label's address?
                            
                                Thread safety of Matlab engine API
                            
                                How to read arbitrary number of values using std::copy?
                            
                                How do 'malloc' and 'new' work? How are they different (implementation wise)? [duplicate]
                            
                                Forcing symbol export with MSVC
                            
                                How can I check is a socket is still open?
                            
                                Socket Timeout in C++ Linux
                            
                                Can Bison parse UTF-8 characters?
                            
                                How to create two mains in an eclipse C++ project
                            
                                Learning more about distributed computing [closed]
                            
                                Can I cast std::vector<Animal*> to std::vector<Dog*> without looking at each element?
                            
                                Qt signals and slots, threads, app.exec(), and related queries
                            
                                C++ override/overload problem
                            
                                QMetaObject::invokeMethod returns true, but method is never called
                            
                                c++ std::pair, std::vector & memcopy
                            
                                Creating libraries for Arduino

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should I cache the hash code of an STL string used as a hash key?

Tags:

c++

hashtable

caching

hash

stl

David Gladfelter

People also ask

3 Answers

Michael Kristofik

R Samuel Klatchko

Steve Jessop

Recent Activity

Donate For Us