Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is "random-state" in sklearn.model_selection.train_test_split example? [duplicate]

Tags:

Can someone explain me what random_state means in below example?

import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.arange(10).reshape((5, 2)), range(5)


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42) 

Why is it hard coded to 42?

like image 473
Saurabh Avatar asked Mar 07 '18 09:03

Saurabh


People also ask

What is random state in Sklearn Train_test_split?

The random state hyperparameter in the train_test_split() function controls the shuffling process. With random_state=None , we get different train and test sets across different executions and the shuffling process is out of control. With random_state=0 , we get the same train and test sets across different executions.

What is random state in Sklearn?

RandomState exposes a number of methods for generating random numbers drawn from a variety of probability distributions. In addition to the distribution-specific arguments, each method takes a keyword argument size that defaults to None . If size is None , then a single value is generated and returned.

What is the use of random_state 85?

Compute –Train and Test the data and answer multiple questions --What is the use of random_state=85? # The random_state splits a randomly selected data but with a twist. # And the twist is the order of the data will be same for a particular value of random_state.

What does random_state 1 mean?

The random_state is an integer value which implies the selection of a random combination of train and test. When you set the test_size as 1/4 the there is a set generated of permutation and combination of train and test and each combination has one state.


2 Answers

Isn't that obvious? 42 is the Answer to the Ultimate Question of Life, the Universe, and Everything.

On a serious note, random_state simply sets a seed to the random generator, so that your train-test splits are always deterministic. If you don't set a seed, it is different each time.

Relevant documentation:

random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

like image 193
cs95 Avatar answered Sep 27 '22 17:09

cs95


If you don't specify the random_state in the code, then every time you run(execute) your code a new random value is generated and the train and test datasets would have different values each time.

However, if a fixed value is assigned like random_state = 0 or 1 or 42 or any other integer then no matter how many times you execute your code the result would be the same .i.e, same values in train and test datasets.

like image 43
Farzana Khan Avatar answered Sep 27 '22 16:09

Farzana Khan