Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why using numpy.random.seed is not a good practice?

I want to do reproducible tests that use random numbers as inputs. I am used to invoke rng in Matlab and numpy.random.seed in Python. However, I noticed that the Notes section of seed's help reads:

This is a convenience, legacy function.

The best practice is to not reseed a BitGenerator, rather to recreate a new one. This method is here for legacy reasons. This example demonstrates best practice.

from numpy.random import MT19937
from numpy.random import RandomState, SeedSequence
rs = RandomState(MT19937(SeedSequence(123456789)))  
# Later, you want to restart the stream
rs = RandomState(MT19937(SeedSequence(987654321)))

Does anyone know what are the caveats of using seed compared to the docstring suggestion?

like image 740
Gabriel Gleizer Avatar asked Aug 30 '19 14:08

Gabriel Gleizer


People also ask

Does random seed work for NumPy?

The numpy random seed is a numerical value that generates a new set or repeats pseudo-random numbers. The value in the numpy random seed saves the state of randomness. If we call the seed function using value 1 multiple times, the computer displays the same random numbers.

Is NumPy random faster than python random?

NumPy random for generating an array of random numbers 10 000 calls, and even though each call takes longer, you obtain a numpy. ndarray of 1000 random numbers. The reason why NumPy is fast when used right is that its arrays are extremely efficient. They are like C arrays instead of Python lists.

Why do we use NP random seed?

Seed function is used to save the state of a random function, so that it can generate same random numbers on multiple executions of the code on the same machine or on different machines (for a specific seed value).

Does python random need to be seeded?

Python Random seed() MethodThe random number generator needs a number to start with (a seed value), to be able to generate a random number. By default the random number generator uses the current system time. Use the seed() method to customize the start number of the random number generator.


1 Answers

From https://numpy.org/neps/nep-0019-rng-policy.html

The preferred best practice for getting reproducible pseudorandom numbers is to instantiate a generator object with a seed and pass it around. The implicit global RandomState behind the numpy.random.* convenience functions can cause problems, especially when threads or other forms of concurrency are involved. Global state is always problematic. We categorically recommend avoiding using the convenience functions when reproducibility is involved.

That said, people do use them and use numpy.random.seed() to control the state underneath them. It can be hard to categorize and count API usages consistently and usefully, but a very common usage is in unit tests where many of the problems of global state are less likely.

This NEP does not propose removing these functions or changing them to use the less-stable Generator distribution implementations. Future NEPs might.

Specifically, the initial release of the new PRNG subsystem SHALL leave these convenience functions as aliases to the methods on a global RandomState that is initialized with a Mersenne Twister BitGenerator object. A call to numpy.random.seed() will be forwarded to that BitGenerator object. In addition, the global RandomState instance MUST be accessible in this initial release by the name numpy.random.mtrand._rand: Robert Kern long ago promised scikit-learn that this name would be stable. Whoops.

In order to allow certain workarounds, it MUST be possible to replace the BitGenerator underneath the global RandomState with any other BitGenerator object (we leave the precise API details up to the new subsystem). Calling numpy.random.seed() thereafter SHOULD just pass the given seed to the current BitGenerator object and not attempt to reset the BitGenerator to the Mersenne Twister. The set of numpy.random.* convenience functions SHALL remain the same as they currently are. They SHALL be aliases to the RandomState methods and not the new less-stable distributions class (Generator, in the examples above). Users who want to get the fastest, best distributions can follow best practices and instantiate generator objects explicitly.

This NEP does not propose that these requirements remain in perpetuity. After we have experience with the new PRNG subsystem, we can and should revisit these issues in future NEPs.

like image 161
aligur Avatar answered Nov 15 '22 16:11

aligur