Should I use `random.seed` or `numpy.random.seed` to control random number generation in `scikit-learn`?

Tags:

I'm using scikit-learn and numpy and I want to set the global seed so that my work is reproducible.

Should I use numpy.random.seed or random.seed?

From the link in the comments, I understand that they are different, and that the numpy version is not thread-safe. I want to know specifically which one to use to create IPython notebooks for data analysis. Some of the algorithms from scikit-learn involve generating random numbers, and I want to be sure that the notebook shows the same results on every run.

484

asked Jun 25 '15 17:06

shadowtalker

1 Answers

Should I use np.random.seed or random.seed?

That depends on whether in your code you are using numpy's random number generator or the one in random.

The random number generators in numpy.random and random have totally separate internal states, so numpy.random.seed() will not affect the random sequences produced by random.random(), and likewise random.seed() will not affect numpy.random.randn() etc. If you are using both random and numpy.random in your code then you will need to separately set the seeds for both.

Update

Your question seems to be specifically about scikit-learn's random number generators. As far as I can tell, scikit-learn uses numpy.random throughout, so you should use np.random.seed() rather than random.seed().

One important caveat is that np.random is not threadsafe - if you set a global seed, then launch several subprocesses and generate random numbers within them using np.random, each subprocess will inherit the RNG state from its parent, meaning that you will get identical random variates in each subprocess. The usual way around this problem is to pass a different seed (or numpy.random.Random instance) to each subprocess, such that each one has a separate local RNG state.

Since some parts of scikit-learn can run in parallel using joblib, you will see that some classes and functions have an option to pass either a seed or an np.random.RandomState instance (e.g. the random_state= parameter to sklearn.decomposition.MiniBatchSparsePCA). I tend to use a single global seed for a script, then generate new random seeds based on the global seed for any parallel functions.

162

answered Sep 28 '22 17:09

ali_m

Related questions
                            
                                When I use matplotlib in jupyter notebook,it always raise " matplotlib is currently using a non-GUI backend" error?
                            
                                How to convert a list to a list of tuples?
                            
                                import pandas_datareader gives ImportError: cannot import name 'is_list_like'
                            
                                In Python, is there an elegant way to print a list in a custom format without explicit looping?
                            
                                Practical example of Polymorphism
                            
                                How to escape special characters of a string with single backslashes
                            
                                Reading rows from a CSV file in Python
                            
                                How to extract top-level domain name (TLD) from URL
                            
                                Python SQLite: database is locked
                            
                                What is the purpose of classmethod in this code?
                            
                                Python: Cut off the last word of a sentence?
                            
                                Django download a file
                            
                                Adding a module (Specifically pymorph) to Spyder (Python IDE)
                            
                                python save plotly plot to local file and insert into html
                            
                                Import psycopg2 Library not loaded: libssl.1.0.0.dylib
                            
                                Map list item to function with arguments
                            
                                Iterating over a 2 dimensional python list [duplicate]
                            
                                How to easily distribute Python software that has Python module dependencies? Frustrations in Python package installation on Unix
                            
                                Python function argument list formatting
                            
                                How do I correctly install dulwich to get hg-git working on Windows?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should I use `random.seed` or `numpy.random.seed` to control random number generation in `scikit-learn`?

Tags:

python

random

numpy

random-seed

scikit-learn

shadowtalker

People also ask

1 Answers

Update

ali_m

Recent Activity

Donate For Us