After using cross_validation.KFold(n, n_folds=folds) I would like to access the indexes for training and testing of single fold, instead of going through all the folds. So let's take the example code: <pre class="prettyprint"><code>from sklearn import cross_validation X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) y = np.array([1, 2, 3, 4]) kf = cross_validation.KFold(4, n_folds=2) >>> print(kf) sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False, random_state=None) >>> for train_index, test_index in kf: </code></pre> I would like to access the first fold in kf like this (instead of for loop): <pre class="prettyprint"><code>train_index, test_index in kf[0] </code></pre> This should return just the first fold, but instead I get the error: "TypeError: 'KFold' object does not support indexing" What I want as output: <pre class="prettyprint"><code>>>> train_index, test_index in kf[0] >>> print("TRAIN:", train_index, "TEST:", test_index) TRAIN: [2 3] TEST: [0 1] </code></pre> Link: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html <h3>Question</h3> How do I retrieve the indexes for train and test for only a single fold, without going through the whole for loop?

You are on the right track. All you need to do now is: <pre class="prettyprint"><code>kf = cross_validation.KFold(4, n_folds=2) mylist = list(kf) train, test = mylist[0] </code></pre> <code>kf</code> is actually a generator, which doesn't compute the train-test split until it is needed. This improves memory usage, as you are not storing items you don't need. Making a list of the <code>KFold</code> object forces it to make all values available. Here are two great SO question that explain what generators are: one and two <hr> Edit Nov 2018 The API has changed since sklearn 0.20. An updated example (for py3.6): <pre class="prettyprint"><code>from sklearn.model_selection import KFold import numpy as np kf = KFold(n_splits=4) X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) X_train, X_test = next(kf.split(X)) In [12]: X_train Out[12]: array([2, 3]) In [13]: X_test Out[13]: array([0, 1]) </code></pre>

sklearn Kfold acces single fold instead of for loop

Tags:

python

scikit-learn

cross-validation

After using cross_validation.KFold(n, n_folds=folds) I would like to access the indexes for training and testing of single fold, instead of going through all the folds.

So let's take the example code:

from sklearn import cross_validation
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = cross_validation.KFold(4, n_folds=2)

>>> print(kf)  
sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False,
                           random_state=None)
>>> for train_index, test_index in kf:

I would like to access the first fold in kf like this (instead of for loop):

train_index, test_index in kf[0]

This should return just the first fold, but instead I get the error: "TypeError: 'KFold' object does not support indexing"

What I want as output:

>>> train_index, test_index in kf[0]
>>> print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [2 3] TEST: [0 1]

Link: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html

Question

How do I retrieve the indexes for train and test for only a single fold, without going through the whole for loop?

286

asked Dec 09 '14 13:12

NumesSanguis

1 Answers

You are on the right track. All you need to do now is:

kf = cross_validation.KFold(4, n_folds=2)
mylist = list(kf)
train, test = mylist[0]

kf is actually a generator, which doesn't compute the train-test split until it is needed. This improves memory usage, as you are not storing items you don't need. Making a list of the KFold object forces it to make all values available.

Here are two great SO question that explain what generators are: one and two

Edit Nov 2018

The API has changed since sklearn 0.20. An updated example (for py3.6):

from sklearn.model_selection import KFold
import numpy as np

kf = KFold(n_splits=4)

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])


X_train, X_test = next(kf.split(X))

In [12]: X_train
Out[12]: array([2, 3])

In [13]: X_test
Out[13]: array([0, 1])

133

answered Sep 24 '22 18:09

mbatchkarov

Related questions
                            
                                Python MySQLdb iterate through table
                            
                                Error in GAE with ndb - BadQueryError: Cannot convert FalseNode to predicate
                            
                                Can't pretty print json from python
                            
                                In the Pyramid web framework, how do I source sensitive settings into development.ini / production.ini from an external file?
                            
                                Same value for id(float)
                            
                                Using window functions to LIMIT a query with SqlAlchemy on Postgres
                            
                                Creating DataFrame with Hierarchical Columns
                            
                                how to install cloud9 IDE on ubuntu server
                            
                                Python os.stat(file_name).st_size versus os.path.getsize(file_name)
                            
                                extrapolating data with numpy/python
                            
                                Python - is there any way to organize a group of yields in sub function to yield outside the main function?
                            
                                Matrix multiplication, solve Ax = b solve for x
                            
                                Select specific CSV columns (Filtering) - Python/pandas
                            
                                Openpyxl and Hidden/Unhidden Excel Worksheets
                            
                                How to check that variable is a lambda function
                            
                                Different x and y scale in zoomed inset, matplotlib
                            
                                How to get Python to use Assembly
                            
                                Can pytest fixtures be combined?
                            
                                Why should i use vagrant if i use virtualenv?
                            
                                Index pandas DataFrame by column numbers, when column names are integers

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With