Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is numpy.random.choice so slow?

While writing a script I discovered the numpy.random.choice function. I implemented it because it was was much cleaner than the equivalent if statement. However, after running the script I realized it is significantly slower than the if statement.

The following is a MWE. The first method takes 0.0 s, while the second takes 7.2 s. If you scale up the i loop, you will see how fast random.choice slows down.

Can anyone comment on why random.choice is so much slower?

import numpy as np
import numpy.random as rand
import time as tm

#-------------------------------------------------------------------------------

tStart = tm.time()
for i in xrange(100):
    for j in xrange(1000):
        tmp = rand.rand()
        if tmp < 0.25:
            var = 1
        elif tmp < 0.5:
            var = -1
print('Time: %.1f s' %(tm.time() - tStart))

#-------------------------------------------------------------------------------

tStart = tm.time()
for i in xrange(100):
    for j in xrange(1000):
        var = rand.choice([-1, 0, 1], p = [0.25, 0.5, 0.25])
print('Time: %.1f s' %(tm.time() - tStart))
like image 852
Blink Avatar asked Sep 04 '13 20:09

Blink


People also ask

Is NumPy random choice slow?

So, for a single random number, NumPy is significantly slower.

Is NumPy random faster?

Generating a random floatGenerating a single random float is 10x faster using using Python's built-in random module compared to np. random . with NumPy than with base python. So if you need to generate a single random number—or less than 10 numbers—it is faster to simply loop over random.

Is NumPy random truly random?

Indeed, whenever we call a python function, such as np. random. rand() the output can only be deterministic and cannot be truly random. Hence, numpy has to come up with a trick to generate sequences of numbers that look like random and behave as if they came from a purely random source, and this is what PRNG are.

Is NumPy slow?

NumPy is fast because it can do all its calculations without calling back into Python. Since this function involves looping in Python, we lose all the performance benefits of using NumPy. For a 10,000,000-entry NumPy array, this functions takes 2.5 seconds to run on my computer.


1 Answers

You're using it wrong. Vectorize the operation, or numpy will offer no benefit:

var = numpy.random.choice([-1, 0, 1], size=1000, p=[0.25, 0.5, 0.25])

Timing data:

>>> timeit.timeit('''numpy.random.choice([-1, 0, 1],
...                                      size=1000,
...                                      p=[0.25, 0.5, 0.25])''',
...               'import numpy', number=10000)
2.380380242513752

>>> timeit.timeit('''
... var = []
... for i in xrange(1000):
...     tmp = rand.rand()
...     if tmp < 0.25:
...         var.append(1)
...     elif tmp < 0.5:
...         var.append(-1)
...     else:
...         var.append(0)''',
... setup='import numpy.random as rand', number=10000)
5.673041396894519
like image 178
user2357112 supports Monica Avatar answered Oct 18 '22 13:10

user2357112 supports Monica