Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I go about optimizing a hash table for a given population?

Tags:

java

Say I have a population of key-value pairs which I plan to store in a hash table. The population is fixed and will never change. What optimizations are available to me to make the hash table as fast as possible? Which optimizations should I concentrate on? This is assuming I have a lot of space. There will be a reasonable number of pairs (say no more than 100,000).

EDIT: I want to optimize look up. I don't care how long it takes to build.

like image 579
HenryTaylor Avatar asked Oct 11 '10 13:10

HenryTaylor


People also ask

How can you improve the performance of a hash table?

The simplest and most obvious improvement would be to increase the number of buckets in the hash table to something like 1.2 million -- at least assuming your hash function can generate numbers in that range (which it typically will).

How do I choose a hash table size?

But a good general “rule of thumb” is: The hash table should be an array with length about 1.3 times the maximum number of keys that will actually be in the table, and. Size of hash table array should be a prime number.

What are the most important considerations when designing a hash function?

In general, a hash function should depend on every single bit of the key, so that two keys that differ in only one bit or one group of bits (regardless of whether the group is at the beginning, end, or middle of the key or present throughout the key) hash into different values.

What strategies and issues should you consider when you are resizing a hash table?

Hash tables often avoid this problem by making sure that the hash table size is a prime number. When you resize the table, double the size and then round up to the first prime number larger than that. Doing this avoids the clustering problems similar to what you describe.


2 Answers

I would make sure that your key's hash to unique values. This will ensure that every lookup will be constant time, and thus, as fast as possible.

Since you can never have more than 100,000 keys, it is entirely possible to have 100,000 hash values.

Also, make sure that you use the constructor that takes an int to specify the initial capacity (Set it to 100,000), and a float to set the load factor. (Use 1) Also, doing this requires that you have a perfect hash function for your keys. But, this will result in the fastest possible lookup, in the least amount of memory.

like image 111
jjnguy Avatar answered Sep 25 '22 17:09

jjnguy


In general, to optimize a hash table, you want to minimize collisions in the determination of your hash, so your buckets won't contain more than one item and the hash-search will return immediately.

Most of the time, that means that you should measure the output of your hash function on the problem space. So i guess i'd recommend looking into that

like image 41
samy Avatar answered Sep 24 '22 17:09

samy