Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good hash function for permutations?

I have got numbers in a specific range (usually from 0 to about 1000). An algorithm selects some numbers from this range (about 3 to 10 numbers). This selection is done quite often, and I need to check if a permutation of the chosen numbers has already been selected.

e.g one step selects [1, 10, 3, 18] and another one [10, 18, 3, 1] then the second selection can be discarded because it is a permutation.

I need to do this check very fast. Right now I put all arrays in a hashmap, and use a custom hash function: just sums up all the elements, so 1+10+3+18=32, and also 10+18+3+1=32. For equals I use a bitset to quickly check if elements are in both sets (I do not need sorting when using the bitset, but it only works when the range of numbers is known and not too big).

This works ok, but can generate lots of collisions, so the equals() method is called quite often. I was wondering if there is a faster way to check for permutations?

Are there any good hash functions for permutations?

UPDATE

I have done a little benchmark: generate all combinations of numbers in the range 0 to 6, and array length 1 to 9. There are 3003 possible permutations, and a good hash should generated close to this many different hashes (I use 32 bit numbers for the hash):

  • 41 different hashes for just adding (so there are lots of collisions)
  • 8 different hashes for XOR'ing values together
  • 286 different hashes for multiplying
  • 3003 different hashes for (R + 2e) and multiplying as abc has suggested (using 1779033703 for R)

So abc's hash can be calculated very fast and is a lot better than all the rest. Thanks!

PS: I do not want to sort the values when I do not have to, because this would get too slow.

like image 745
martinus Avatar asked Oct 08 '09 08:10

martinus


People also ask

What function can serve as a good hash function?

If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes. you are not likely to do better with one of the "well known" functions such as PJW, K&R[1], etc. Also see tpop pp.

What is the benefit of using hash functions instead of permutations?

A hash function does not have restrictions on keeping the output the same size as the input. In fact, it usually won't be the same size. There is also no requirement to use only characters that appear in the input. A hash might sometimes look like a permutation, but I would treat them as separate concepts.

What is the most important hash function?

The most important property of hash functions is the size of the hash. A larger hash makes it more difficult to invert the function, and it ensures that the function is collision free. Because hash functions have a fixed output but unlimited inputs, multiple values can produce the same hash.


1 Answers

One potential candidate might be this. Fix a odd integer R. For each element e you want to hash compute the factor (R + 2*e). Then compute the product of all these factors. Finally divide the product by 2 to get the hash.

The factor 2 in (R + 2e) guarantees that all factors are odd, hence avoiding that the product will ever become 0. The division by 2 at the end is because the product will always be odd, hence the division just removes a constant bit.

E.g. I choose R = 1779033703. This is an arbitrary choice, doing some experiments should show if a given R is good or bad. Assume your values are [1, 10, 3, 18]. The product (computed using 32-bit ints) is

(R + 2) * (R + 20) * (R + 6) * (R + 36) = 3376724311

Hence the hash would be

3376724311/2 = 1688362155.

like image 121
abc Avatar answered Sep 18 '22 17:09

abc