Where should we use
X_train,X_test,y_train,y_test= train_test_split(data, test_size=0.3, random_state=42)
and where should we use
train, test= train_test_split(data, test_size=0.3, random_state=0).
The former one return this:
value error: not enough values to unpack (expected 4, got 2)
In general, putting 80% of the data in the training set, 10% in the validation set, and 10% in the test set is a good split to start with. The optimum split of the test, validation, and train set depends upon factors such as the use case, the structure of the model, dimension of the data, etc.
Empirical studies show that the best results are obtained if we use 20-30% of the data for testing, and the remaining 70-80% of the data for training.
By using similar data for training and testing, you can minimize the effects of data discrepancies and better understand the characteristics of the model. After a model has been processed by using the training set, you test the model by making predictions against the test set.
The first form you use if you want to split instances with features (X) and labels (y). The second form you use if you only want to split features (X).
X_train, X_test, y_train, y_test= train_test_split(data, y, test_size=0.3, random_state=42)
The reason why it didn' t work for you was because you didn't prodide the label data in your train_test_split()
function. The above should work well. Just replace y
with your label/target data.
if you have 1 data list, it split to 2,
|---data_train
data ----train_test_split()--|
|---data_test
if you have 2 data list, it split EACH of the data list to 2, that is 4 in total.
|---data_train_x
|---data_train_y
data_x, data_y ----train_test_split()--|
|---data_test_x
|---data_test_y
The same as n data list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With