What would be a good hash function for an array of integers? For example I have two arrays [1,2,3] and [1,5]. What hash functions should I adopt to seperate both these arrays? I thought of adding up each element after raising it to the power of 2 but this has a large cost associated with it due to multiple multiplications. Is there any simple hashing function for this scenario?
For that particular data set, just subtract one from the second-to-last item, that will give you a perfect minimal hash, with buckets 0 and 1 produced :-)
More seriously, the choice of a good hashing function does depend a great deal on the sort of data so that should be taken into consideration. It's hard to suggest something without knowing the properties of the data you'll be storing.
I would start simply by choosing an arbitrary function such as adding all the items in the array then adding the array length to that, and reducing it modulo some value:
numbuckets = 97
bucket = array.length() % numbuckets
for index in range (array.length()):
bucket = (bucket + array[index]) % numbuckets
Then examine the results (across a great many real data sets) to make sure there's not too many collisions. If there are, choose another function.
It's the same as with optimisation: measure, don't guess! Actually monitor the collisions and usage and act if it gets bad.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With