After using cross_validation.KFold(n, n_folds=folds) I would like to access the indexes for training and testing of single fold, instead of going through all the folds.
So let's take the example code:
from sklearn import cross_validation
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = cross_validation.KFold(4, n_folds=2)
>>> print(kf)
sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False,
random_state=None)
>>> for train_index, test_index in kf:
I would like to access the first fold in kf like this (instead of for loop):
train_index, test_index in kf[0]
This should return just the first fold, but instead I get the error: "TypeError: 'KFold' object does not support indexing"
What I want as output:
>>> train_index, test_index in kf[0]
>>> print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [2 3] TEST: [0 1]
Link: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html
How do I retrieve the indexes for train and test for only a single fold, without going through the whole for loop?
Shuffled KFold In that case KFold will randomly pick the datapoints which would become part of the train and test set. Or to be precise not completely randomly, random_state influences which points appear each set and the same random_state always results in the same split.
KFold will provide train/test indices to split data in train and test sets. It will split dataset into k consecutive folds (without shuffling by default). Each fold is then used a validation set once while the k - 1 remaining folds form the training set (source).
Provides train/test indices to split data in train/test sets.
You are on the right track. All you need to do now is:
kf = cross_validation.KFold(4, n_folds=2)
mylist = list(kf)
train, test = mylist[0]
kf
is actually a generator, which doesn't compute the train-test split until it is needed. This improves memory usage, as you are not storing items you don't need. Making a list of the KFold
object forces it to make all values available.
Here are two great SO question that explain what generators are: one and two
Edit Nov 2018
The API has changed since sklearn 0.20. An updated example (for py3.6):
from sklearn.model_selection import KFold
import numpy as np
kf = KFold(n_splits=4)
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
X_train, X_test = next(kf.split(X))
In [12]: X_train
Out[12]: array([2, 3])
In [13]: X_test
Out[13]: array([0, 1])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With