Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Result of GridSearchCV as table

I did grid search + crossvalidation on a SVM with RBF kernel to find optimal value of parameters C and gamma using the class GridShearchCV. Now I would like to get the result in a tabular format like

C/gamma 1e-3 1e-2 1e3
0.1      0.2  ..  0.3
1        0.9
10       ..   
100      ..

where cells contain accuracy score for that couple of parameters values.

Or at least, if first solution is not possible, something easier like

C    gamma  accuracy
0.1  1e-4      0.2 
...

I am not very skilled in Python, so I don't know where to start. Could you give me some method to do this kind of representations? The best solution would be to have the table as a plot but also a simple print in console in those formats would be fine. Thank you in advance.

like image 412
Gianluca Amprimo Avatar asked Nov 13 '19 10:11

Gianluca Amprimo


People also ask

What happens GridSearchCV?

It runs through all the different parameters that is fed into the parameter grid and produces the best combination of parameters, based on a scoring metric of your choice (accuracy, f1, etc). Obviously, nothing is perfect and GridSearchCV is no exception: “best parameters” results are limited. process is time-consuming.

How much time does Gridsearch CV take?

Observing the above time numbers, for parameter grid having 3125 combinations, the Grid Search CV took 10856 seconds (~3 hrs) whereas Halving Grid Search CV took 465 seconds (~8 mins), which is approximate 23x times faster.

Does Gridsearch do cross-validation?

Yes, GridSearchCV performs cross-validation.

What is CV value in GridSearchCV?

cv: number of cross-validation you have to try for each selected set of hyperparameters. verbose: you can set it to 1 to get the detailed print out while you fit the data to GridSearchCV. n_jobs: number of processes you wish to run in parallel for this task if it -1 it will use all available processors.


Video Answer


2 Answers

You could make use of the cv_results_ attribute of the gridsearchCV object as shown below:

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svc = svm.SVC(gamma="scale")
clf = GridSearchCV(svc, parameters, cv=5)
clf.fit(iris.data, iris.target)

Now you use clf.cv_results_

{'mean_fit_time': array([0.00049248, 0.00051575, 0.00051174, 0.00044131]),
 'mean_score_time': array([0.0002739 , 0.00027657, 0.00023718, 0.00023627]),
 'mean_test_score': array([0.98      , 0.96666667, 0.97333333, 0.98      ]),
 'param_C': masked_array(data=[1, 1, 10, 10],
              mask=[False, False, False, False],
        fill_value='?',
             dtype=object),
 'param_kernel': masked_array(data=['linear', 'rbf', 'linear', 'rbf'],
              mask=[False, False, False, False],
        fill_value='?',
             dtype=object),
 'params': [{'C': 1, 'kernel': 'linear'},
  {'C': 1, 'kernel': 'rbf'},
  {'C': 10, 'kernel': 'linear'},
  {'C': 10, 'kernel': 'rbf'}],
 'rank_test_score': array([1, 4, 3, 1], dtype=int32),
 'split0_test_score': array([0.96666667, 0.96666667, 1.        , 0.96666667]),
 'split1_test_score': array([1.        , 0.96666667, 1.        , 1.        ]),
 'split2_test_score': array([0.96666667, 0.96666667, 0.9       , 0.96666667]),
 'split3_test_score': array([0.96666667, 0.93333333, 0.96666667, 0.96666667]),
 'split4_test_score': array([1., 1., 1., 1.]),
 'std_fit_time': array([1.84329827e-04, 1.34653950e-05, 1.26220210e-04, 1.76294378e-05]),
 'std_score_time': array([6.23956317e-05, 1.34498512e-05, 3.57596078e-06, 4.68175419e-06]),
 'std_test_score': array([0.01632993, 0.02108185, 0.03887301, 0.01632993])}

You can make use of the params and the mean_test_score for constructing the dataframe you are looking using the below command:

pd.concat([pd.DataFrame(clf.cv_results_["params"]),pd.DataFrame(clf.cv_results_["mean_test_score"], columns=["Accuracy"])],axis=1)

And your final dataframe looks like

    C   kernel  Accuracy
0   1   linear  0.980000
1   1   rbf     0.966667
2   10  linear  0.973333
3   10  rbf     0.980000

Hope this helps!

like image 67
Parthasarathy Subburaj Avatar answered Oct 17 '22 22:10

Parthasarathy Subburaj


Perhaps easier:

pd.DataFrame({'param': clf.cv_results_["params"], 'acc': clf.cv_results_["mean_test_score"]})

or:

df = pd.DataFrame(clf.cv_results_)
like image 5
keramat Avatar answered Oct 18 '22 00:10

keramat