Scikit-learn, GroupKFold with shuffling groups?

Tags:

I was using StratifiedKFold from scikit-learn, but now I need to watch also for "groups". There is nice function GroupKFold, but my data are very time dependent. So similary as in help, ie number of week is the grouping index. But each week should be only in one fold.

Suppose I need 10 folds. What I need is to shuffle data first, before I can used GroupKFold.

Shuffling is in group sence - so whole groups should be shuffle among each other.

Is there way to do is with scikit-learn elegant somehow? Seems to me GroupKFold is robust to shuffle data first.

If there is no way to do it with scikit, can anyone write some effective code of this? I have large data sets.

matrix, label, groups as inputs

328

asked Nov 26 '16 14:11

gugatr0n1c

1 Answers

The same group will not appear in two different folds (the number of distinct groups has to be at least equal to the number of folds)

In the GroupKfold the shape of the group is the same as data shape

For data in X, y and groups:

import numpy as np
import pandas as pd
from sklearn.model_selection import GroupKFold
from sklearn.model_selection import GridSearchCV
from xgboost import XGBClassifier
import datetime

X = np.array([[1,2,1,1], [3,4,7,8], [5,6,1,3], [7,8,4,7]])
y=np.array([0,2,1,2])
groups=np.array([2,1,0,1])  
group_kfold = GroupKFold(n_splits=len(groups.unique))
group_kfold.get_n_splits(X, y, groups)

 param_grid ={
        'min_child_weight': [50,100],
        'subsample': [0.1,0.2],
        'colsample_bytree': [0.1,0.2],
        'max_depth': [2,3],
        'learning_rate': [0.01],
        'n_estimators': [100,500],
        'reg_lambda': [0.1,0.2]        
        }

xgb = XGBClassifier()

grid_search = GridSearchCV(xgb, param_grid, cv=group_kfold.split(X, Y, groups), n_jobs=-1)

result = grid_search.fit(X,Y)

answered Sep 17 '22 03:09

Mukul Gupta

Related questions
                            
                                Convert mask (boolean) array to list of x,y coordinates
                            
                                Chunking bytes (not strings) in Python 2 and 3
                            
                                TK Framework double implementation issue
                            
                                Python Pandas Distance matrix using jaccard similarity
                            
                                Alpine 3.3, Python 2.7.11, urllib2 causing SSL: CERTIFICATE_VERIFY_FAILED
                            
                                Pyinstaller Jinja2 TemplateNotFound
                            
                                Is there a way to download a video from a webpage with python?
                            
                                Spark reading python3 pickle as input
                            
                                Pandas Dataframe: how to add column with number of occurrences in other column
                            
                                Python prettytable Sort by Multiple Columns
                            
                                Multiple exceptions and code coverage when unit testing python
                            
                                Turn off logging in schedule library
                            
                                Packaging a Python library with an executable
                            
                                Define pagination page_size per-view in Django Rest Framework
                            
                                Remove ns0 from XML
                            
                                hdf5 not supported (please install/reinstall h5py) Scipy not supported! when importing TFLearn?
                            
                                Why Do I have to worry about Thread Safety in CPython?
                            
                                Selenium TypeError: __init__() takes 2 positional arguments but 3 were given [duplicate]
                            
                                How do I create an animated gif in Python using Wand?
                            
                                Calling MSSQL stored procedure from SqlAlchemy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scikit-learn, GroupKFold with shuffling groups?

Tags:

python

shuffle

scikit-learn

cross-validation

gugatr0n1c

People also ask

1 Answers

Mukul Gupta

Recent Activity

Donate For Us