Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skip forbidden parameter combinations when using GridSearchCV

I want to greedily search the entire parameter space of my support vector classifier using GridSearchCV. However, some combinations of parameters are forbidden by LinearSVC and throw an exception. In particular, there are mutually exclusive combinations of the dual, penalty, and loss parameters:

For example, this code:

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV

iris = datasets.load_iris()
parameters = {'dual':[True, False], 'penalty' : ['l1', 'l2'], \
              'loss': ['hinge', 'squared_hinge']}
svc = svm.LinearSVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)

Returns ValueError: Unsupported set of arguments: The combination of penalty='l2' and loss='hinge' are not supported when dual=False, Parameters: penalty='l2', loss='hinge', dual=False

My question is: is it possible to make GridSearchCV skip combinations of parameters which the model forbids? If not, is there an easy way to construct a parameter space which won't violate the rules?

like image 648
crypdick Avatar asked Mar 24 '17 21:03

crypdick


People also ask

What is Param_grid in GridSearchCV?

param_grid – A dictionary with parameter names as keys and lists of parameter values. 3. scoring – The performance measure. For example, 'r2' for regression models, 'precision' for classification models.

What is CV parameter in GridSearchCV?

cv: number of cross-validation you have to try for each selected set of hyperparameters. verbose: you can set it to 1 to get the detailed print out while you fit the data to GridSearchCV.

What is IID GridSearchCV?

iid : boolean, default=True. If True, the data is assumed to be identically distributed across the folds, and the loss minimized is the total loss per sample, and not the mean loss across the folds. cv : int, cross-validation generator or an iterable, optional.


1 Answers

I solved this problem by passing error_score=0.0 to GridSearchCV:

error_score : ‘raise’ (default) or numeric

Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error.

UPDATE: newer versions of sklearn print out a bunch of ConvergenceWarning and FitFailedWarning. I had a hard time surppressing them with contextlib.suppress, but there is a hack around that involving a testing context manager:

from sklearn import svm, datasets 
from sklearn.utils._testing import ignore_warnings 
from sklearn.exceptions import FitFailedWarning, ConvergenceWarning 
from sklearn.model_selection import GridSearchCV 

with ignore_warnings(category=[ConvergenceWarning, FitFailedWarning]): 
    iris = datasets.load_iris() 
    parameters = {'dual':[True, False], 'penalty' : ['l1', 'l2'], \ 
                 'loss': ['hinge', 'squared_hinge']} 
    svc = svm.LinearSVC() 
    clf = GridSearchCV(svc, parameters, error_score=0.0) 
    clf.fit(iris.data, iris.target)
like image 62
crypdick Avatar answered Sep 28 '22 01:09

crypdick