I am using kfold function from sklearn package in python on a df (data frame) with non-contious row indexes.
this is the code:
kFold = KFold(n_splits=10, shuffle=True, random_state=None)
for train_index, test_index in kFold.split(dfNARemove):...
I get some train_index or test_index that doesn't exist in my df.
what can I do?
It will return the K different scores(accuracy percentage), which are based on kth test data set. And we generally take the average to analyse the model.
Provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default). Each fold is then used once as a validation while the k - 1 remaining folds form the training set.
If shuffle is True, the whole data is first shuffled and then split into the K-Folds. For repeatable behavior, you can set the random_state, for example to an integer seed (random_state=0). If your parameters depend on the shuffling, this means your parameter selection is very unstable.
kFold iterator yields to you positional indices of train and validation objects of DataFrame, not their non-continuous indices. You can access your train and validation objects by using .iloc
pandas method:
kFold = KFold(n_splits=10, shuffle=True, random_state=None)
for train_index, test_index in kFold.split(dfNARemove):
train_data = dfNARemove.iloc[train_index]
test_data = dfNARemove.iloc[test_index]
If you want to know, which non-continuous indices used for train_index and test_index on each fold, you can do following:
non_continuous_train_index = dfNARemove.index[train_index]
non_continuous_test_index = dfNARemove.index[test_index]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With