Is there any hash function that generates the same bucket for vectors having the same elements, with the same relative positions but shifted k times? For example: <pre class="prettyprint"><code>hash([1,9,8,7]) -> b1 hash([9,8,7,1]) -> b1 hash([1,8,9,7]) -> b2 hash([1,9,8,5]) -> b3 </code></pre> v1 = [1,9,8,7] v2 = [9,8,7,1] Both vectors should get the same hash since v2 is v1 left shifted k=3 times. But v3 = [1,8,9,7] doesn't keep the same relative order and v4 = [1,9,8,5] has different values so neither of them get the hash b1. My initial approach was to calculte the max value for each vector and consider its position as a reference (offset = 0). Having that I would only have to shift each vector so that the maximun value would be always at the first position. This way shifted vectors would look the same. However, vectors can have repeated elements and thus the maximun value has different positions.

<ol> <li> Find the lexicographically minimal array rotation. The native way is to check all rotations in O(n2), but it can be done in linear time using Booth's Algorithm, Shiloach's Fast Canonization Algorithm or Duval's Lyndon Factorization Algorithm. See this for more. </li> <li> Calculate the hash of the rotated array. This can be done in various ways. Java, for example, would do it as follows: <pre class="prettyprint"><code>hash = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] </code></pre> </li> </ol> It's not impossible that arrays with different elements will hash to the same value (this is inevitable with hashing), but all rotations of the same array will have the same hash.

Offset independent hash function

Tags:

arrays

algorithm

hash

Is there any hash function that generates the same bucket for vectors having the same elements, with the same relative positions but shifted k times?

For example:

hash([1,9,8,7]) -> b1
hash([9,8,7,1]) -> b1

hash([1,8,9,7]) -> b2
hash([1,9,8,5]) -> b3

v1 = [1,9,8,7] v2 = [9,8,7,1] Both vectors should get the same hash since v2 is v1 left shifted k=3 times.

But v3 = [1,8,9,7] doesn't keep the same relative order and v4 = [1,9,8,5] has different values so neither of them get the hash b1.

My initial approach was to calculte the max value for each vector and consider its position as a reference (offset = 0). Having that I would only have to shift each vector so that the maximun value would be always at the first position. This way shifted vectors would look the same. However, vectors can have repeated elements and thus the maximun value has different positions.

627

asked Aug 20 '13 08:08

Pablo Francisco Pérez Hidalgo

3 Answers

Find the lexicographically minimal array rotation.

The native way is to check all rotations in O(n²), but it can be done in linear time using Booth's Algorithm, Shiloach's Fast Canonization Algorithm or Duval's Lyndon Factorization Algorithm.

See this for more.
Calculate the hash of the rotated array.

This can be done in various ways. Java, for example, would do it as follows:
```
hash = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
```

It's not impossible that arrays with different elements will hash to the same value (this is inevitable with hashing), but all rotations of the same array will have the same hash.

answered Oct 25 '22 01:10

Bernhard Barker

If we concatenated b1 with itself then we get:

[1,9,8,7,1,9,8,7]

This array contains all cyclic permutations of the original array.

If we then calculate a hash for every subarray of length 4 and join and combine these, you will have a unique hash. The hash function calculation may require some optimizing, depending on the size of your arrays.

EDIT: every subarray, except for the last, which equals the first!

answered Oct 24 '22 23:10

DDW

If you do not care so much about the occasional hash collision, you could simply take the sum of all the elements as a hash (but be careful of floating point issues), since that is invariant to any rotation of the vector. Alternatively, you could xor or sum all the hashes of the individual elements. You could also calculate something based on the difference of subsequent elements (while wrapping around for the last to the first element). Add a few of these properties that are invariant to rotation together and the chance that two 'unequal' arrays will yield the same hash will be pretty low. Maybe something like

n = length(x)
rot_invariant_hash = hash(n) + sum(hash(x[i])) + sum(hash(x[mod(i+1, n)] - x[i]))

where you can replace all the sums for any other commutative (?) operation like XOR. Also make sure that the hash-function applied on the differences is not the identity function, or these parts will all add up to zero. All this takes O(n) computation time.

Just a curiosity: what is your intended application?

answered Oct 24 '22 23:10

Bas Swinckels

Related questions
                            
                                How does *(&arr + 1) - arr give the length in elements of array arr?
                            
                                high performance 'proper' c++ alternative to variable length array
                            
                                C#, rotating 2D arrays
                            
                                Calculating length of objects in binary image - algorithm
                            
                                How to correctly write declarations of extern arrays (and double arrays) in C's header files?
                            
                                Finding frequent sequence of numbers in an array
                            
                                Receive arrays of arrays of ... in D function?
                            
                                Very Strange Problem sending data via Sockets in C#
                            
                                C++: constructor / initializer for array?
                            
                                C++: dynamically allocating a member array of structs using non-default constructor
                            
                                Shouldn't declaration match its definition when array is involved?
                            
                                How to malloc char** table?
                            
                                Java Generics: Array containing generics [duplicate]
                            
                                How to declare array of allocatable scalars in Fortran?
                            
                                Is passing a float[] as ref float to unmanaged code a good idea?
                            
                                How to convert a date array that was returned from date_parse back into a date string
                            
                                How to remove certain entries (using regex) from array?
                            
                                How to remove zero values from an array in parallel
                            
                                Enumerate NSArray starting at givven index searching both ways (no wrap around)
                            
                                Compare two multidimensional arrays then create array of only unique

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With