Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash function for an array of integers

Tags:

hash

What would be a good hash function for an array of integers? For example I have two arrays [1,2,3] and [1,5]. What hash functions should I adopt to seperate both these arrays? I thought of adding up each element after raising it to the power of 2 but this has a large cost associated with it due to multiple multiplications. Is there any simple hashing function for this scenario?

like image 614
Chimba Avatar asked Sep 11 '14 03:09

Chimba


1 Answers

For that particular data set, just subtract one from the second-to-last item, that will give you a perfect minimal hash, with buckets 0 and 1 produced :-)

More seriously, the choice of a good hashing function does depend a great deal on the sort of data so that should be taken into consideration. It's hard to suggest something without knowing the properties of the data you'll be storing.

I would start simply by choosing an arbitrary function such as adding all the items in the array then adding the array length to that, and reducing it modulo some value:

numbuckets = 97
bucket = array.length() % numbuckets
for index in range (array.length()):
    bucket = (bucket + array[index]) % numbuckets

Then examine the results (across a great many real data sets) to make sure there's not too many collisions. If there are, choose another function.

It's the same as with optimisation: measure, don't guess! Actually monitor the collisions and usage and act if it gets bad.

like image 193
paxdiablo Avatar answered Jan 02 '23 11:01

paxdiablo