Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve performance of a hashtable with 1 million elements and 997 buckets?

This is an interview question.

Suppose that there are 1 million elements in the table and 997 buckets of unordered lists. Further suppose that the hash function distributes keys with equal probability (i.e., each bucket has 1000 elements).

What is the worst case time to find an element which is not in the table? To find one which is in the table? How can you improve this?

My solution: The worst case time of finding an element not in table and in table are all O(1000). 1000 is the length of the unsorted list.

Improve it : (0) straightforward, increase bucket numbers > 1 million. (1) each bucket holds a second hashtable, which use a different hash function to compute hash value for the second table. it will be O(1) (2) each bucket holds a binary search tree. It will be O(lg n).

is it possible to make a trade-off between space and time. Keep both of them in a reasonable range.

Any better ideas ? thanks !

like image 385
user1002288 Avatar asked Feb 06 '12 06:02

user1002288


People also ask

How can you improve the performance of a hash table?

The simplest and most obvious improvement would be to increase the number of buckets in the hash table to something like 1.2 million -- at least assuming your hash function can generate numbers in that range (which it typically will).

How many hash buckets should you configure your Hashtable with to get the best performance?

A good rule of the thumb (not always ideal, well, just a rule of the thumb) is to re-hash if the hashtable is filled up to 80%. That means if you have 100 buckets and 80 items inside, regardless how many collision you had before, it's getting time to increase capacity.

What has the greatest effect on hash table performance?

The most memory efficient datastructure for associations The hash table with the best memory efficiency is simply the one with the highest load factor, (it can even exceed 100% memory efficiency by using key compression with compact hashing ). A hash table like that does still provide O(1) lookups, just very slow.

How many buckets are in djb2 hash function?

djb2 - faster hash function - 256 buckets (higher memory consumption)


1 Answers

The simplest and most obvious improvement would be to increase the number of buckets in the hash table to something like 1.2 million -- at least assuming your hash function can generate numbers in that range (which it typically will).

like image 132
Jerry Coffin Avatar answered Sep 23 '22 14:09

Jerry Coffin