Can someone explain me what random_state
means in below example?
import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.arange(10).reshape((5, 2)), range(5)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
Why is it hard coded to 42?
The random state hyperparameter in the train_test_split() function controls the shuffling process. With random_state=None , we get different train and test sets across different executions and the shuffling process is out of control. With random_state=0 , we get the same train and test sets across different executions.
RandomState exposes a number of methods for generating random numbers drawn from a variety of probability distributions. In addition to the distribution-specific arguments, each method takes a keyword argument size that defaults to None . If size is None , then a single value is generated and returned.
Compute –Train and Test the data and answer multiple questions --What is the use of random_state=85? # The random_state splits a randomly selected data but with a twist. # And the twist is the order of the data will be same for a particular value of random_state.
The random_state is an integer value which implies the selection of a random combination of train and test. When you set the test_size as 1/4 the there is a set generated of permutation and combination of train and test and each combination has one state.
Isn't that obvious? 42 is the Answer to the Ultimate Question of Life, the Universe, and Everything.
On a serious note, random_state
simply sets a seed to the random generator, so that your train-test splits are always deterministic. If you don't set a seed, it is different each time.
Relevant documentation:
random_state
:int
,RandomState
instance orNone
, optional (default=None
)
Ifint
,random_state
is the seed used by the random number generator; IfRandomState
instance,random_state
is the random number generator; IfNone
, the random number generator is theRandomState
instance used bynp.random
.
If you don't specify the random_state in the code, then every time you run(execute) your code a new random value is generated and the train and test datasets would have different values each time.
However, if a fixed value is assigned like random_state = 0 or 1 or 42 or any other integer then no matter how many times you execute your code the result would be the same .i.e, same values in train and test datasets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With