Python: faster alternative to numpy's random.choice()?

Tags:

I'm trying to sample 1000 numbers between 0 and 999, with a vector of weights dictating the probability that a particular number will be chosen:

Click to copy

import numpy as np
resampled_indices = np.random.choice(a = 1000, size = 1000, replace = True, p = weights)

Unfortunately, this process has to be run thousands of times in a larger for loop, and it seems that np.random.choice is the main speed bottleneck in the process. As such, I was wondering if there's any way to speed up np.random.choice or to use an alternative method that gives the same results.

322

asked May 27 '18 03:05

InquisitiveInquirer

1 Answers

It seems you can do slightly faster by using a uniform sampling and then "inverting" the cumulative distrubtion using np.searchsorted:

Click to copy

# assume arbitrary probabilities
weights = np.random.randn(1000)**2
weights /= weights.sum()

def weighted_random(w, n):
    cumsum = np.cumsum(w)
    rdm_unif = np.random.rand(n)
    return np.searchsorted(cumsum, rdm_unif)

# first method
%timeit np.random.choice(a = 1000, size = 1000, replace = True, p = weights)
# 10000 loops, best of 3: 220 µs per loop

# proposed method
%timeit weighted_random(weights, n)
# 10000 loops, best of 3: 158 µs per loop

Now we can check empirically that the probabilities are correct:

Click to copy

samples =np.empty((10000,1000),dtype=int)
for i in xrange(10000):
    samples[i,:] = weighted_random(weights)

empirical = 1. * np.bincount(samples.flatten()) / samples.size
((empirical - weights)**2).max()
# 3.5e-09

150

answered Sep 30 '22 18:09

M1L0U

Related questions
                            
                                DRF How to serialize models inheritance ? (read/write)
                            
                                Why does print(__name__) give 'builtins'?
                            
                                How to apply funtion to single Column of large dataset using Dask?
                            
                                Passing the Windows Authentication context from IIS to Python using FastCGI
                            
                                Why does the TensorFlow Estimator API take inputs as a lambda?
                            
                                How to save the plot with the labels and ticks area transparent but not the main plot area in matplotlib?
                            
                                Access a file from a python egg
                            
                                Constructing hierarchy from dictionary/JSON
                            
                                finding intersection of intervals in pandas
                            
                                Jupyter notebook jt command not found
                            
                                confused by apply function of GradientBoostingClassifier
                            
                                Clear all variables which where defined in Jupiter cell after execution finished
                            
                                python: zipfile.ZipFile No such file or directory
                            
                                Scipy welch and MATLAB pwelch does not provide same answer
                            
                                Replicating rows in pandas dataframe by column value and add a new column with repetition index
                            
                                Why does isinstance return the wrong value only inside a series map?
                            
                                In-place custom object unpacking different behavior with __getitem__ python 3.5 vs python 3.6
                            
                                networkx - meaning of weight in betwenness and current flow betweenness
                            
                                Convert timeseries to image matrix
                            
                                Elementwise division of sparse matrices, ignoring 0/0

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: faster alternative to numpy's random.choice()?

Tags:

python

random

numpy

InquisitiveInquirer

People also ask

1 Answers

M1L0U

Recent Activity

Donate For Us