Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: faster alternative to numpy's random.choice()?

I'm trying to sample 1000 numbers between 0 and 999, with a vector of weights dictating the probability that a particular number will be chosen:

import numpy as np
resampled_indices = np.random.choice(a = 1000, size = 1000, replace = True, p = weights)

Unfortunately, this process has to be run thousands of times in a larger for loop, and it seems that np.random.choice is the main speed bottleneck in the process. As such, I was wondering if there's any way to speed up np.random.choice or to use an alternative method that gives the same results.

like image 322
InquisitiveInquirer Avatar asked May 27 '18 03:05

InquisitiveInquirer


People also ask

Is NumPy random faster than Python random?

NumPy random for generating an array of random numbers 10 000 calls, and even though each call takes longer, you obtain a numpy. ndarray of 1000 random numbers. The reason why NumPy is fast when used right is that its arrays are extremely efficient. They are like C arrays instead of Python lists.

How do you generate random choices in Python?

The choice() method returns a randomly selected element from the specified sequence. The sequence can be a string, a range, a list, a tuple or any other kind of sequence.

How do you select multiple random numbers in a list Python?

Use the numpy. random. choice() function to pick multiple random rows from the multidimensional array.

How do you generate a random sample from an array in Python?

random. choice() function is used to get random elements from a NumPy array. It is a built-in function in the NumPy package of python.


1 Answers

It seems you can do slightly faster by using a uniform sampling and then "inverting" the cumulative distrubtion using np.searchsorted:

# assume arbitrary probabilities
weights = np.random.randn(1000)**2
weights /= weights.sum()

def weighted_random(w, n):
    cumsum = np.cumsum(w)
    rdm_unif = np.random.rand(n)
    return np.searchsorted(cumsum, rdm_unif)

# first method
%timeit np.random.choice(a = 1000, size = 1000, replace = True, p = weights)
# 10000 loops, best of 3: 220 µs per loop

# proposed method
%timeit weighted_random(weights, n)
# 10000 loops, best of 3: 158 µs per loop

Now we can check empirically that the probabilities are correct:

samples =np.empty((10000,1000),dtype=int)
for i in xrange(10000):
    samples[i,:] = weighted_random(weights)

empirical = 1. * np.bincount(samples.flatten()) / samples.size
((empirical - weights)**2).max()
# 3.5e-09
like image 150
M1L0U Avatar answered Sep 30 '22 18:09

M1L0U