Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is np.random.default_rng().permutation(n) preferred over the original np.random.permutation(n)?

Numpy documentation on np.random.permutation suggests all new code use np.random.default_rng() from the Random Generator package. I see in the documentation that the Random Generator package has standardized the generation of a wide variety of random distributions around the BitGenerator vs using Mersenne Twister, which I'm vaguely familiar with.

I see one downside, what used to be a single line of code to do simple permutations:

np.random.permutation(10)

turns into two lines of code now, which feels a little awkward for such a simple task:

rng = np.random.default_rng()
rng.permutation(10)
  • Why is this new approach an improvement over the previous approach?
  • And why wouldn't existing methods like np.random.permutation just wrap this new preferred method?
  • Is there a good reason not to use this new method as a one-liner np.random.default_rng().permutation(10), assuming it's not being called at high volumes?
  • Is there an argument for switching existing code to this method?
like image 847
David Parks Avatar asked Oct 11 '25 20:10

David Parks


1 Answers

Some context:

  • Does numpy.random.seed() always give the same random number every time?
  • NumPy: Decide on new PRNG BitGenerator default

To your questions, in a logical order:

And why wouldn't existing methods like np.random.permutation just wrap this new preferred method?

Probably because of backwards compatibility concerns. Even if the "top-level" API would not be changing, its internals would be significantly enough to be deemed a break in compatability.

Why is this new approach an improvement over the previous approach?

"By default, Generator uses bits provided by PCG64 which has better statistical properties than the legacy MT19937 used in RandomState." (source). The PCG64 docstring provides more technical detail.

Is there a good reason not to use this new method as a one-liner np.random.default_rng().permutation(10), assuming it's not being called at high volumes?

I very much agree that it's a slightly awkward added line of code if it's done at the module-start. I would only point out that the NumPy docs do directly use this form in docstring examples, such as:

n = np.random.default_rng().standard_exponential((3, 8000))

The slight difference would be that one is instantiating a class at module load/import time, whereas in your form it might come later. But that should be a minuscule difference (again, assuming it's only used once or a handful of times). If you look at the default_rng(seed) source, when called with None, it just returns Generator(PCG64(seed)) after a few quick checks on seed.

Is there an argument for switching existing code to this method?

Going to pass on this one since I don't have anywhere near the depth of technical knowledge to give a good comparison of the algorithms, and also because it depends on some other variables such as whether you're concerned about making your downstream code compatibility with older versions of NumPy, where default_rng() simply doesn't exist.

like image 96
Brad Solomon Avatar answered Oct 14 '25 11:10

Brad Solomon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!