Fastest way to generate hash function from k wise independent hash family for small k(<=5)

Q: What is K in hash function?

In computer science, a family of hash functions is said to be k-independent, k-wise independent or k-universal if selecting a function at random from the family guarantees that the hash codes of any designated k keys are independent random variables (see precise mathematical definitions below).

Q: What is pairwise independent hashing?

Definition 8. A pairwise-independent hash family is a set of functions H = {h : [m] → [l]} such that for all a, b ∈ [m] and all c, d ∈ [l] we have Prh[h(a) = c&and;h(b) = d]=1/l2, where the probability is taken over choosing a uniformly random h ∈ H.

Q: What is a 2 universal hash function?

Universal hash families. Family of hash functions H is 2-universal if for any x≠y, Pr[h(x)=h(y)] ≤ 1/n for random h∈H.

Q: Is pairwise independent always universal?

This is called pairwise independent because it may be limited to any two variables—for any positive integer k, there's a corresponding notion of k-wise independence. Obviously any pairwise-independent hash family is universal (proof: exercise), but the converse does not hold.

Tags:

I need a hash function h[n]:[t] from k wise independent hash family when k is small (<= 5). Or I need n hash values chosen uniformly randomly from [1-t] such that they are k wise independent. I am trying to implement some randomized algorithms where I needed this. I was generating n random number from range [1-t] using

scipy.stats.randint(0,self._t).rvs(self._n)

but this seems to be too slow for my application. Since I don't need full randomness but only 4 wise independence I was wondering if I could speed this up. I know I can use polynomial hash family to get k wise independence but is this the best? If yes, is there any fast implementation of this which I can plug in? If no, what are the alternative ways (libraries, possibly in Python)?

I have looked at this thread Obtaining a k-wise independent hash function but I am not sure what the accepted answer means by this: "if you need k different hash, just re-use the same algorithm k times, with k different seeds".

Any suggestions are greatly appreciated. Thanks.

943

asked Sep 27 '17 14:09

user1131274

1 Answers

You can try to use numba's jit with numpy's random.randint():

import scipy.stats import numpy as np from numba import jit  def randint_scipy(n):     return scipy.stats.randint(0, 10000).rvs(n)  def randint_numpy(n):     return np.random.randint(0, 10000, n)  @jit def randint_numpy_jit(n):     return np.random.randint(0, 10000, n)  %timeit randint_scipy(5) %timeit randint_numpy(5) %timeit randint_numpy_jit(5)

Output:

1.09 ms ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 4.63 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 960 ns ± 50.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Thus, numpy + numba is 1135 times faster than scipy's implementation of randint().

answered Sep 29 '22 11:09

Dmitry Mottl

Related questions
                            
                                What does `HTTPContext.SignInAsync` do behind the scenes?
                            
                                .Net Core Process.Start on Linux
                            
                                docker and mysql: Got an error reading communication packets
                            
                                Removing SCNNode nodes from scene.rootNode is causing a crash, in SceneKit
                            
                                Spotify Web API - Authenticate user via native mobile app rather than browser?
                            
                                Execution failed for task ':app:compileDebugJavaWithJavac':
                            
                                WEB_HOOK channel unavailable when trying to retrieve Google calendar events
                            
                                Linkedin: Sharing URL Summary not appearing
                            
                                JobScheduler JobService is started without Application
                            
                                Bluez Profile Registration
                            
                                Angular 6 - Why is Bearer Token missing in production build? (works fine in dev build)
                            
                                Run Selenium parallel test on Azure batch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With