Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scikit Learn GridSearchCV without cross validation (unsupervised learning)

Tags:

Is it possible to use GridSearchCV without cross validation? I am trying to optimize the number of clusters in KMeans clustering via grid search, and thus I don't need or want cross validation.

The documentation is also confusing me because under the fit() method, it has an option for unsupervised learning (says to use None for unsupervised learning). But if you want to do unsupervised learning, you need to do it without cross validation and there appears to be no option to get rid of cross validation.

like image 532
DataMan Avatar asked Jun 19 '17 17:06

DataMan


People also ask

Does GridSearchCV do cross-validation?

In GridSearchCV, along with Grid Search, cross-validation is also performed. Cross-Validation is used while training the model. As we know that before training the model with data, we divide the data into two parts – train data and test data.

How does Sklearn GridSearchCV work?

GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.

What is IID GridSearchCV?

iid : boolean, default=True. If True, the data is assumed to be identically distributed across the folds, and the loss minimized is the total loss per sample, and not the mean loss across the folds. cv : integer or cross-validation generator, default=3. If an integer is passed, it is the number of folds.


1 Answers

After much searching, I was able to find this thread. It appears that you can get rid of cross validation in GridSearchCV if you use:

cv=[(slice(None), slice(None))]

I have tested this against my own coded version of grid search without cross validation and I get the same results from both methods. I am posting this answer to my own question in case others have the same issue.

Edit: to answer jjrr's question in the comments, here is an example use case:

from sklearn.metrics import silhouette_score as sc  def cv_silhouette_scorer(estimator, X):     estimator.fit(X)     cluster_labels = estimator.labels_     num_labels = len(set(cluster_labels))     num_samples = len(X.index)     if num_labels == 1 or num_labels == num_samples:         return -1     else:         return sc(X, cluster_labels)  cv = [(slice(None), slice(None))] gs = GridSearchCV(estimator=sklearn.cluster.MeanShift(), param_grid=param_dict,                    scoring=cv_silhouette_scorer, cv=cv, n_jobs=-1) gs.fit(df[cols_of_interest]) 
like image 121
DataMan Avatar answered Oct 02 '22 17:10

DataMan