How to get absolutely reproducible results with Scikit Learn?

1 Answers

Defining random seed will make sure that every time you run the algorithm, the random will generate the same numbers. IMHO, the result will always be the same as long as we use the same data, and the same values of any other parameters.

As you have read in sklearn's FAQ, it is the same either you define it globally by numpy.random.seed() or by set random_state parameter in all algorithms involved, provided that you set the same number for both cases.

I take example from sklearn docs, to illustrate it.

import numpy
from sklearn.model_selection import train_test_split
# numpy.random.seed(42)
X, y = np.arange(10).reshape((5, 2)), range(5)

#1 running this many times, Xtr will remain [[4, 5],[0, 1],[6, 7]].
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.33, random_state=42)

#2 try running this line many times, you will get various Xtr
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.33)

Now uncomment the third line. Run #2 many times. Xtr will always be [[4, 5],[0, 1],[6, 7]]

By numpy.random.seed(), it sets seed to default (None) and then it will try to read data from /dev/urandom (or the Windows analogue) if available or seed from the clock otherwise. docs

116

answered Oct 17 '22 16:10

ipramusinto

Related questions
                            
                                how to load m4a file in python
                            
                                BrowserMob Proxy Python - How to get response body?
                            
                                How to check if an array is in another array in Python
                            
                                In Fabric 2/Invoke: change directory and use sudo
                            
                                GridSearchCV.best_score not same as cross_val_score(GridSearchCV.best_estimator_)
                            
                                SyntaxError: invalid syntax with variable annotation
                            
                                Saving a 3D numpy array to .txt file
                            
                                How to encode a text stream into a byte stream in Python 3?
                            
                                Airflow backfills and new dag runs
                            
                                Keras plot_model not showing the input layer appropriately
                            
                                What is the difference between cv2.StereoSGBM_create() and cv2.StereoBM_create() functions for disparity mapping on opencv3?
                            
                                Log Stacktrace of current Python Interpreter via PostgreSQL trigger
                            
                                Create a surface plot of xyz altitude data in Python
                            
                                PyTorch - better way to get back original tensor order after torch.sort
                            
                                How to build an embedding layer in Tensorflow RNN?
                            
                                Scipy: Minimize violates given bounds
                            
                                Get previous and newly selected item on activation in QComboBox
                            
                                how to launch recent Jupyter QtConsole on Windows without a console window
                            
                                Jupyter notebook not launching from Anaconda Navigator
                            
                                Support both encrypted and non encrypted configuration with aumbry

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get absolutely reproducible results with Scikit Learn?

Tags:

python

random-seed

scikit-learn

Outcast

People also ask

1 Answers

ipramusinto

Recent Activity

Donate For Us