Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to generate hash function from k wise independent hash family for small k(<=5)

Tags:

I need a hash function h[n]:[t] from k wise independent hash family when k is small (<= 5). Or I need n hash values chosen uniformly randomly from [1-t] such that they are k wise independent. I am trying to implement some randomized algorithms where I needed this. I was generating n random number from range [1-t] using

scipy.stats.randint(0,self._t).rvs(self._n)

but this seems to be too slow for my application. Since I don't need full randomness but only 4 wise independence I was wondering if I could speed this up. I know I can use polynomial hash family to get k wise independence but is this the best? If yes, is there any fast implementation of this which I can plug in? If no, what are the alternative ways (libraries, possibly in Python)?

I have looked at this thread Obtaining a k-wise independent hash function but I am not sure what the accepted answer means by this: "if you need k different hash, just re-use the same algorithm k times, with k different seeds".

Any suggestions are greatly appreciated. Thanks.

like image 943
user1131274 Avatar asked Sep 27 '17 14:09

user1131274


People also ask

What is K in hash function?

In computer science, a family of hash functions is said to be k-independent, k-wise independent or k-universal if selecting a function at random from the family guarantees that the hash codes of any designated k keys are independent random variables (see precise mathematical definitions below).

What is pairwise independent hashing?

Definition 8. A pairwise-independent hash family is a set of functions H = {h : [m] → [l]} such that for all a, b ∈ [m] and all c, d ∈ [l] we have Prh[h(a) = c∧h(b) = d]=1/l2, where the probability is taken over choosing a uniformly random h ∈ H.

What is a 2 universal hash function?

Universal hash families. Family of hash functions H is 2-universal if for any x≠y, Pr[h(x)=h(y)] ≤ 1/n for random h∈H.

Is pairwise independent always universal?

This is called pairwise independent because it may be limited to any two variables—for any positive integer k, there's a corresponding notion of k-wise independence. Obviously any pairwise-independent hash family is universal (proof: exercise), but the converse does not hold.


1 Answers

You can try to use numba's jit with numpy's random.randint():

import scipy.stats import numpy as np from numba import jit  def randint_scipy(n):     return scipy.stats.randint(0, 10000).rvs(n)  def randint_numpy(n):     return np.random.randint(0, 10000, n)  @jit def randint_numpy_jit(n):     return np.random.randint(0, 10000, n)  %timeit randint_scipy(5) %timeit randint_numpy(5) %timeit randint_numpy_jit(5) 

Output:

1.09 ms ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 4.63 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 960 ns ± 50.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 

Thus, numpy + numba is 1135 times faster than scipy's implementation of randint().

like image 92
Dmitry Mottl Avatar answered Sep 29 '22 11:09

Dmitry Mottl