scikit-learn random state in splitting dataset

Tags:

Can anyone tell me why we set random state to zero in splitting train and test set.

X_train, X_test, y_train, y_test = \     train_test_split(X, y, test_size=0.30, random_state=0)

I have seen situations like this where random state is set to 1!

X_train, X_test, y_train, y_test = \     train_test_split(X, y, test_size=0.30, random_state=1)

What is the consequence of this random state in cross validation as well?

586

asked Feb 12 '17 18:02

Shelly

2 Answers

It doesn't matter if the random_state is 0 or 1 or any other integer. What matters is that it should be set the same value, if you want to validate your processing over multiple runs of the code. By the way I have seen random_state=42 used in many official examples of scikit as well as elsewhere also.

random_state as the name suggests, is used for initializing the internal random number generator, which will decide the splitting of data into train and test indices in your case. In the documentation, it is stated that:

If random_state is None or np.random, then a randomly-initialized RandomState object is returned.

If random_state is an integer, then it is used to seed a new RandomState object.

If random_state is a RandomState object, then it is passed through.

This is to check and validate the data when running the code multiple times. Setting random_state a fixed value will guarantee that same sequence of random numbers are generated each time you run the code. And unless there is some other randomness present in the process, the results produced will be same as always. This helps in verifying the output.

121

answered Sep 21 '22 06:09

Vivek Kumar

If you don't mention the random_state in the code, then whenever you execute your code a new random value is generated and the train and test datasets would have different values each time.

However, if you use a particular value for random_state(random_state = 1 or any other value) everytime the result will be same,i.e, same values in train and test datasets.

answered Sep 20 '22 06:09

Rishi Bansal

Related questions
                            
                                python 'is not' operator
                            
                                How to extract from a list of objects a list of specific attribute?
                            
                                Modifying a symlink in python
                            
                                How can I partially read a huge CSV file?
                            
                                Using explicit (predefined) validation set for grid search with sklearn
                            
                                How do I unit test PySpark programs?
                            
                                Read XLSB File in Pandas Python
                            
                                Selenium waitForElement
                            
                                Python Conditional Variable Setting
                            
                                import matplotlib.pyplot hangs
                            
                                Extract matplotlib colormap in hex-format
                            
                                Can I get the exception from the finally block in python?
                            
                                How to remove repeated elements in a vector, similar to 'set' in Python
                            
                                Selection with .loc in python
                            
                                Using fourier analysis for time series prediction
                            
                                How do you directly overlay a scatter plot on top of a jpg image in matplotlib / Python?
                            
                                How to create/customize your own scorer function in scikit-learn?
                            
                                How do you create a custom activation function with Keras?
                            
                                Python regex findall
                            
                                Save Naive Bayes Trained Classifier in NLTK

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

scikit-learn random state in splitting dataset

Tags:

python

random

machine-learning

scikit-learn

Shelly

People also ask

2 Answers

Vivek Kumar

Rishi Bansal

Recent Activity

Donate For Us