Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Offset independent hash function

Is there any hash function that generates the same bucket for vectors having the same elements, with the same relative positions but shifted k times?

For example:

hash([1,9,8,7]) -> b1
hash([9,8,7,1]) -> b1

hash([1,8,9,7]) -> b2
hash([1,9,8,5]) -> b3

v1 = [1,9,8,7] v2 = [9,8,7,1] Both vectors should get the same hash since v2 is v1 left shifted k=3 times.

But v3 = [1,8,9,7] doesn't keep the same relative order and v4 = [1,9,8,5] has different values so neither of them get the hash b1.

My initial approach was to calculte the max value for each vector and consider its position as a reference (offset = 0). Having that I would only have to shift each vector so that the maximun value would be always at the first position. This way shifted vectors would look the same. However, vectors can have repeated elements and thus the maximun value has different positions.

like image 627
Pablo Francisco Pérez Hidalgo Avatar asked Aug 20 '13 08:08

Pablo Francisco Pérez Hidalgo


People also ask

What are independent hash functions?

A family of hash functions is -independent if for any distinct keys. and any hash codes (not necessarily distinct)

What is pairwise independent hashing?

Definition 8. A pairwise-independent hash family is a set of functions H = {h : [m] → [l]} such that for all a, b ∈ [m] and all c, d ∈ [l] we have Prh[h(a) = c∧h(b) = d]=1/l2, where the probability is taken over choosing a uniformly random h ∈ H.

What is a 2 universal hash function?

A collection H of hash functions h : {1,...,M}→{0,...,m−1} is said to be 2-universal. if for every two different x, y ∈ {1,...,M} we have. Prh∈H[h(x) = h(y)] ≤ 1. m.

Is double hashing more secure?

In general, it provides no additional security to double hash or double encrypt something. If you can break the hash once, you can break it again. It usually doesn't hurt security to do this, though.


3 Answers

  1. Find the lexicographically minimal array rotation.

    The native way is to check all rotations in O(n2), but it can be done in linear time using Booth's Algorithm, Shiloach's Fast Canonization Algorithm or Duval's Lyndon Factorization Algorithm.

    See this for more.

  2. Calculate the hash of the rotated array.

    This can be done in various ways. Java, for example, would do it as follows:

    hash = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
    

It's not impossible that arrays with different elements will hash to the same value (this is inevitable with hashing), but all rotations of the same array will have the same hash.

like image 81
Bernhard Barker Avatar answered Oct 25 '22 01:10

Bernhard Barker


If we concatenated b1 with itself then we get:

[1,9,8,7,1,9,8,7]

This array contains all cyclic permutations of the original array.

If we then calculate a hash for every subarray of length 4 and join and combine these, you will have a unique hash. The hash function calculation may require some optimizing, depending on the size of your arrays.

EDIT: every subarray, except for the last, which equals the first!

like image 20
DDW Avatar answered Oct 24 '22 23:10

DDW


If you do not care so much about the occasional hash collision, you could simply take the sum of all the elements as a hash (but be careful of floating point issues), since that is invariant to any rotation of the vector. Alternatively, you could xor or sum all the hashes of the individual elements. You could also calculate something based on the difference of subsequent elements (while wrapping around for the last to the first element). Add a few of these properties that are invariant to rotation together and the chance that two 'unequal' arrays will yield the same hash will be pretty low. Maybe something like

n = length(x)
rot_invariant_hash = hash(n) + sum(hash(x[i])) + sum(hash(x[mod(i+1, n)] - x[i]))

where you can replace all the sums for any other commutative (?) operation like XOR. Also make sure that the hash-function applied on the differences is not the identity function, or these parts will all add up to zero. All this takes O(n) computation time.

Just a curiosity: what is your intended application?

like image 37
Bas Swinckels Avatar answered Oct 24 '22 23:10

Bas Swinckels