Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between random draws from scipy.stats....rvs and numpy.random

It seems if it is the same distribution, drawing random samples from numpy.random is faster than doing so from scipy.stats.-.rvs. I was wondering what causes the speed difference between the two?

like image 237
joon Avatar asked Oct 22 '10 23:10

joon


People also ask

What is RVS in Scipy stats?

Random variates of given size.

Is NumPy random faster than Python random?

Generating a single random float is 10x faster using using Python's built-in random module compared to np. random . with NumPy than with base python. So if you need to generate a single random number—or less than 10 numbers—it is faster to simply loop over random.

What does RVS () do in Python?

Exponential Distribution in Python stats module's expon. rvs() method which takes shape parameter scale as its argument which is nothing but 1/lambda in the equation. To shift distribution use the loc argument, size decides the number of random variates in the distribution.

What is random rand () function in NumPy?

The numpy.random.rand() function creates an array of specified shape and fills it with random values. Syntax : numpy.random.rand(d0, d1, ..., dn) Parameters : d0, d1, ..., dn : [int, optional]Dimension of the returned array we require, If no argument is given a single Python float is returned.


2 Answers

scipy.stats.uniform actually uses numpy, here is the corresponding function in stats (mtrand is an alias for numpy.random)

class uniform_gen(rv_continuous):
    def _rvs(self):
        return mtrand.uniform(0.0,1.0,self._size)

scipy.stats has a bit of overhead for error checking and making the interface more flexible. The speed difference should be minimal as long as you don't call uniform.rvs in a loop for each draw. You can get instead all random draws at once, for example (10 million)

>>> rvs = stats.uniform.rvs(size=(10000, 1000))
>>> rvs.shape
(10000, 1000)

Here is the long answer, that I wrote a while ago:

The basic random numbers in scipy/numpy are created by Mersenne-Twister PRNG in numpy.random. The random numbers for distributions in numpy.random are in cython/pyrex and are pretty fast.

scipy.stats doesn't have a random number generator, random numbers are obtained in one of three ways:

  • directly from numpy.random, e.g. normal, t, ... pretty fast

  • random numbers by transformation of other random numbers that are available in numpy.random, also pretty fast because this operates on entire arrays of numbers

  • generic: the only generic generation random number generation is by using the ppf (inverse cdf) to transform uniform random numbers. This is relatively fast if there is an explicit expression for the ppf, but can be very slow if the ppf has to be calculated indirectly. For example if only the pdf is defined, then the cdf is obtained through numerical integration and the ppf is obtained through an equation solver. So a few distributions are very slow.

like image 62
Josef Avatar answered Sep 21 '22 15:09

Josef


I ran into this today and just wanted to add some timing details to this question. I saw what joon mentioned where, in particular, random numbers from the normal distribution were much more quickly generated with numpy than from rvs in scipy.stats. As user333700 mentioned there is some overhead with rvs but if you are generating an array of random values then that gap closes compared to numpy. Here is a jupyter timing example:

from scipy.stats import norm
import numpy as np

n = norm(0, 1)
%timeit -n 1000 n.rvs(1)[0]
%timeit -n 1000 np.random.normal(0,1)

%timeit -n 1000 a = n.rvs(1000)
%timeit -n 1000 a = [np.random.normal(0,1) for i in range(0, 1000)]
%timeit -n 1000 a = np.random.randn(1000)

This, on my run with numpy version 1.11.1 and scipy 0.17.0, outputs:

1000 loops, best of 3: 46.8 µs per loop
1000 loops, best of 3: 492 ns per loop
1000 loops, best of 3: 115 µs per loop
1000 loops, best of 3: 343 µs per loop
1000 loops, best of 3: 61.9 µs per loop

So just generating one random sample from rvs was almost 100x slower than using numpy directly. However, if you are generating an array of values than the gap closes (115 to 61.9 microseconds).

If you can avoid it, probably don't call rvs to get one random value a ton of times in a loop.

like image 23
Paul Avatar answered Sep 18 '22 15:09

Paul