Why does calling the KFold generator with shuffle give the same indices?

Tags:

With sklearn, when you create a new KFold object and shuffle is true, it'll produce a different, newly randomized fold indices. However, every generator from a given KFold object gives the same indices for each fold even when shuffle is true. Why does it work like this?

Example:

from sklearn.cross_validation import KFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = KFold(4, n_folds=2, shuffle = True)

for fold in kf:
    print fold

print '---second round----'

for fold in kf:
    print fold

Output:

(array([2, 3]), array([0, 1]))
(array([0, 1]), array([2, 3]))
---second round----#same indices for the folds
(array([2, 3]), array([0, 1]))
(array([0, 1]), array([2, 3]))

This question was motivated by a comment on this answer. I decided to split it into a new question to prevent that answer from becoming too long.

491

asked Jan 22 '16 06:01

ilyas patanam

1 Answers

A new iteration with the same KFold object will not reshuffle the indices, that only happens during instantiation of the object. KFold() never sees the data but knows number of samples so it uses that to shuffle the indices. From the code during instantiation of KFold:

if shuffle:
    rng = check_random_state(self.random_state)
    rng.shuffle(self.idxs)

Each time a generator is called to iterate through the indices of each fold, it will use same shuffled indices and divide them the same way.

Take a look at the code for the base class of KFold _PartitionIterator(with_metaclass(ABCMeta)) where __iter__ is defined. The __iter__ method in the base class calls _iter_test_indices in KFold to divide and yield the train and test indices for each fold.

answered Nov 14 '22 22:11

ilyas patanam

Related questions
                            
                                Trying to use BeautifulSoup to find a specific table in an HTML doc
                            
                                Sort pandas MultiIndex
                            
                                Rotating an image with pyCairo
                            
                                Stack multiple images in python pillow
                            
                                python structured/recarray type conversion behaviour
                            
                                Download .xls files from a webpage using Python and BeautifulSoup
                            
                                insert html into template using AJAX with Python Flask
                            
                                How to change the jupyter favicon.ico
                            
                                pip install from SVN [error 2]
                            
                                Weird behaviour with semicolon before function call in ipython/ipython notebook
                            
                                Turn axes off for all subplots of a figure
                            
                                Checking that the geometry for a triangle is contained in a list of lines
                            
                                Several nested 'for' loops, continue to next iteration of outer loop if condition inside inner loop is true
                            
                                Django ImproperlyConfigured - The SECRET_KEY setting must not be empty
                            
                                Django: 'python manage.py runserver' gives "Segmentation fault" error
                            
                                Replace commas enclosed in curly braces
                            
                                uWSGI + Django + Python: no module named uwsgi
                            
                                How to Replicate Multidict for Flask Post Unit Test Python
                            
                                TensorFlow network not training?
                            
                                Django: Override save method to handle unique=True IntegrityError

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does calling the KFold generator with shuffle give the same indices?

Tags:

python

scikit-learn

cross-validation

ilyas patanam

People also ask

1 Answers

ilyas patanam

Recent Activity

Donate For Us