I need a hash function h[n]:[t]
from k wise independent hash family when k is small (<= 5
). Or I need n hash values chosen uniformly randomly from [1-t]
such that they are k wise independent. I am trying to implement some randomized algorithms where I needed this. I was generating n random number from range [1-t]
using
scipy.stats.randint(0,self._t).rvs(self._n)
but this seems to be too slow for my application. Since I don't need full randomness but only 4 wise independence I was wondering if I could speed this up. I know I can use polynomial hash family to get k wise independence but is this the best? If yes, is there any fast implementation of this which I can plug in? If no, what are the alternative ways (libraries, possibly in Python)?
I have looked at this thread Obtaining a k-wise independent hash function but I am not sure what the accepted answer means by this: "if you need k different hash, just re-use the same algorithm k times, with k different seeds".
Any suggestions are greatly appreciated. Thanks.
In computer science, a family of hash functions is said to be k-independent, k-wise independent or k-universal if selecting a function at random from the family guarantees that the hash codes of any designated k keys are independent random variables (see precise mathematical definitions below).
Definition 8. A pairwise-independent hash family is a set of functions H = {h : [m] → [l]} such that for all a, b ∈ [m] and all c, d ∈ [l] we have Prh[h(a) = c∧h(b) = d]=1/l2, where the probability is taken over choosing a uniformly random h ∈ H.
Universal hash families. Family of hash functions H is 2-universal if for any x≠y, Pr[h(x)=h(y)] ≤ 1/n for random h∈H.
This is called pairwise independent because it may be limited to any two variables—for any positive integer k, there's a corresponding notion of k-wise independence. Obviously any pairwise-independent hash family is universal (proof: exercise), but the converse does not hold.
You can try to use numba's jit
with numpy's random.randint()
:
import scipy.stats import numpy as np from numba import jit def randint_scipy(n): return scipy.stats.randint(0, 10000).rvs(n) def randint_numpy(n): return np.random.randint(0, 10000, n) @jit def randint_numpy_jit(n): return np.random.randint(0, 10000, n) %timeit randint_scipy(5) %timeit randint_numpy(5) %timeit randint_numpy_jit(5)
Output:
1.09 ms ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 4.63 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 960 ns ± 50.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Thus, numpy + numba is 1135 times faster than scipy's implementation of randint()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With