I'm trying to sample 1000 numbers between 0 and 999, with a vector of weights dictating the probability that a particular number will be chosen:
import numpy as np
resampled_indices = np.random.choice(a = 1000, size = 1000, replace = True, p = weights)
Unfortunately, this process has to be run thousands of times in a larger for loop, and it seems that np.random.choice
is the main speed bottleneck in the process. As such, I was wondering if there's any way to speed up np.random.choice
or to use an alternative method that gives the same results.
NumPy random for generating an array of random numbers 10 000 calls, and even though each call takes longer, you obtain a numpy. ndarray of 1000 random numbers. The reason why NumPy is fast when used right is that its arrays are extremely efficient. They are like C arrays instead of Python lists.
The choice() method returns a randomly selected element from the specified sequence. The sequence can be a string, a range, a list, a tuple or any other kind of sequence.
Use the numpy. random. choice() function to pick multiple random rows from the multidimensional array.
random. choice() function is used to get random elements from a NumPy array. It is a built-in function in the NumPy package of python.
It seems you can do slightly faster by using a uniform sampling and then "inverting" the cumulative distrubtion using np.searchsorted
:
# assume arbitrary probabilities
weights = np.random.randn(1000)**2
weights /= weights.sum()
def weighted_random(w, n):
cumsum = np.cumsum(w)
rdm_unif = np.random.rand(n)
return np.searchsorted(cumsum, rdm_unif)
# first method
%timeit np.random.choice(a = 1000, size = 1000, replace = True, p = weights)
# 10000 loops, best of 3: 220 µs per loop
# proposed method
%timeit weighted_random(weights, n)
# 10000 loops, best of 3: 158 µs per loop
Now we can check empirically that the probabilities are correct:
samples =np.empty((10000,1000),dtype=int)
for i in xrange(10000):
samples[i,:] = weighted_random(weights)
empirical = 1. * np.bincount(samples.flatten()) / samples.size
((empirical - weights)**2).max()
# 3.5e-09
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With