Can anyone tell me why we set random state to zero in splitting train and test set.
X_train, X_test, y_train, y_test = \ train_test_split(X, y, test_size=0.30, random_state=0)
I have seen situations like this where random state is set to 1!
X_train, X_test, y_train, y_test = \ train_test_split(X, y, test_size=0.30, random_state=1)
What is the consequence of this random state in cross validation as well?
The random state hyperparameter in the train_test_split() function controls the shuffling process. With random_state=None , we get different train and test sets across different executions and the shuffling process is out of control. With random_state=0 , we get the same train and test sets across different executions.
EXAMPLE 3: Use random_state to make a repeatable split That's because train_test_split allocates rows of data to the output randomly . Therefore, every time you run train_test_split with the default settings, the output data will contain observations that are randomly selected from the input data.
The random state is simply the lot number of the set generated randomly in any operation. We can specify this lot number whenever we want the same set again.
You should provide either train_size or test_size . If neither is given, then the default share of the dataset that will be used for testing is 0.25 , or 25 percent. random_state is the object that controls randomization during splitting. It can be either an int or an instance of RandomState .
It doesn't matter if the random_state is 0 or 1 or any other integer. What matters is that it should be set the same value, if you want to validate your processing over multiple runs of the code. By the way I have seen random_state=42
used in many official examples of scikit as well as elsewhere also.
random_state
as the name suggests, is used for initializing the internal random number generator, which will decide the splitting of data into train and test indices in your case. In the documentation, it is stated that:
If random_state is None or np.random, then a randomly-initialized RandomState object is returned.
If random_state is an integer, then it is used to seed a new RandomState object.
If random_state is a RandomState object, then it is passed through.
This is to check and validate the data when running the code multiple times. Setting random_state
a fixed value will guarantee that same sequence of random numbers are generated each time you run the code. And unless there is some other randomness present in the process, the results produced will be same as always. This helps in verifying the output.
If you don't mention the random_state in the code, then whenever you execute your code a new random value is generated and the train and test datasets would have different values each time.
However, if you use a particular value for random_state(random_state = 1 or any other value) everytime the result will be same,i.e, same values in train and test datasets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With