Good hash function for permutations?

Tags:

I have got numbers in a specific range (usually from 0 to about 1000). An algorithm selects some numbers from this range (about 3 to 10 numbers). This selection is done quite often, and I need to check if a permutation of the chosen numbers has already been selected.

e.g one step selects [1, 10, 3, 18] and another one [10, 18, 3, 1] then the second selection can be discarded because it is a permutation.

I need to do this check very fast. Right now I put all arrays in a hashmap, and use a custom hash function: just sums up all the elements, so 1+10+3+18=32, and also 10+18+3+1=32. For equals I use a bitset to quickly check if elements are in both sets (I do not need sorting when using the bitset, but it only works when the range of numbers is known and not too big).

This works ok, but can generate lots of collisions, so the equals() method is called quite often. I was wondering if there is a faster way to check for permutations?

Are there any good hash functions for permutations?

UPDATE

I have done a little benchmark: generate all combinations of numbers in the range 0 to 6, and array length 1 to 9. There are 3003 possible permutations, and a good hash should generated close to this many different hashes (I use 32 bit numbers for the hash):

41 different hashes for just adding (so there are lots of collisions)
8 different hashes for XOR'ing values together
286 different hashes for multiplying
3003 different hashes for (R + 2e) and multiplying as abc has suggested (using 1779033703 for R)

So abc's hash can be calculated very fast and is a lot better than all the rest. Thanks!

PS: I do not want to sort the values when I do not have to, because this would get too slow.

745

asked Oct 08 '09 08:10

martinus

1 Answers

One potential candidate might be this. Fix a odd integer R. For each element e you want to hash compute the factor (R + 2*e). Then compute the product of all these factors. Finally divide the product by 2 to get the hash.

The factor 2 in (R + 2e) guarantees that all factors are odd, hence avoiding that the product will ever become 0. The division by 2 at the end is because the product will always be odd, hence the division just removes a constant bit.

E.g. I choose R = 1779033703. This is an arbitrary choice, doing some experiments should show if a given R is good or bad. Assume your values are [1, 10, 3, 18]. The product (computed using 32-bit ints) is

(R + 2) * (R + 20) * (R + 6) * (R + 36) = 3376724311

Hence the hash would be

3376724311/2 = 1688362155.

121

answered Sep 18 '22 17:09

abc

Related questions
                            
                                bulk insert from Java into Oracle
                            
                                Best practices and literature for web application load testing [closed]
                            
                                RESTful authentication - resulting poor performance on high load?
                            
                                Fastest way to fill an array with a single value [duplicate]
                            
                                Heroku shared db vs Amazon RDS Performance
                            
                                Cost of inlining methods in C#
                            
                                JQuery grep(...) VS native JavaScript filter(...) function performance
                            
                                Why does this difference in asm matter for performance (in an un-optimized ptr++ vs. ++ptr loop)?
                            
                                How does python optimize conditional list comprehensions
                            
                                Concurrency in web applications
                            
                                Rails - Completed time for a request significantly higher than View and DB times
                            
                                Performance of calling Method/Field.getAnnotation(Class) several times vs. Pre-caching this data in a Map
                            
                                Safe and fast FFT
                            
                                Ways to improve UIWebView scrolling performance?
                            
                                Performance explanation: code runs faster with unused variable
                            
                                Is FirstOrDefault/First and OrderByDescending, quicker than LastOrDefault/Last and OrderBy? [duplicate]
                            
                                How to improve Jenkins server performance?
                            
                                Building many product flavors is too slow
                            
                                Why is numpy sum 10 times slower than the + operator?
                            
                                Full-text search relevance is measured in?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Good hash function for permutations?

Tags:

performance

hash

permutation

martinus

People also ask

1 Answers

abc

Recent Activity

Donate For Us