Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting seed on train_test_split sklearn python

is there any way to set seed on train_test_split on python sklearn. I have set the parameter random_state to an integer, but I still can not reproduce the result.

Thanks in advance.

like image 929
Bernando Purba Avatar asked May 16 '19 10:05

Bernando Purba


People also ask

What is seed in train-test?

seed(). Seeds allow you to create a starting point for randomly generated numbers, so that each time your code is run the same answer is generated. The advantage of doing this in your sampling is that you or anyone else can recreate the exact same training and test sets by using the same seed.

What is random seed in train-test split?

1. Splitting data into training/validation/test sets: random seeds ensure that the data is divided the same way every time the code is run. 2.

What does random_state 42 mean?

With random_state=42 , we get the same train and test sets across different executions, but in this time, the train and test sets are different from the previous case with random_state=0 . The train and test sets directly affect the model's performance score.

Why do we use train-test split?

The train-test split is used to estimate the performance of machine learning algorithms that are applicable for prediction-based Algorithms/Applications. This method is a fast and easy procedure to perform such that we can compare our own machine learning model results to machine results.


1 Answers

from sklearn.model_selection import train_test_split
x = [k for k in range(0, 10)]
y = [k for k in range(0, 10)]
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4, random_state=11)
print (x_train)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4, random_state=11)
print (x_train)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4, random_state=11)
print (x_train)

The above code will produce the same result for x_train every time I split the data. It is possible that the randomness is in your dataframe, not train_test_split.

like image 168
secretive Avatar answered Nov 09 '22 06:11

secretive