I did grid search + crossvalidation on a SVM with RBF kernel to find optimal value of parameters C and gamma using the class GridShearchCV. Now I would like to get the result in a tabular format like
C/gamma 1e-3 1e-2 1e3
0.1 0.2 .. 0.3
1 0.9
10 ..
100 ..
where cells contain accuracy score for that couple of parameters values.
Or at least, if first solution is not possible, something easier like
C gamma accuracy
0.1 1e-4 0.2
...
I am not very skilled in Python, so I don't know where to start. Could you give me some method to do this kind of representations? The best solution would be to have the table as a plot but also a simple print in console in those formats would be fine. Thank you in advance.
It runs through all the different parameters that is fed into the parameter grid and produces the best combination of parameters, based on a scoring metric of your choice (accuracy, f1, etc). Obviously, nothing is perfect and GridSearchCV is no exception: “best parameters” results are limited. process is time-consuming.
Observing the above time numbers, for parameter grid having 3125 combinations, the Grid Search CV took 10856 seconds (~3 hrs) whereas Halving Grid Search CV took 465 seconds (~8 mins), which is approximate 23x times faster.
Yes, GridSearchCV performs cross-validation.
cv: number of cross-validation you have to try for each selected set of hyperparameters. verbose: you can set it to 1 to get the detailed print out while you fit the data to GridSearchCV. n_jobs: number of processes you wish to run in parallel for this task if it -1 it will use all available processors.
You could make use of the cv_results_
attribute of the gridsearchCV
object as shown below:
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svc = svm.SVC(gamma="scale")
clf = GridSearchCV(svc, parameters, cv=5)
clf.fit(iris.data, iris.target)
Now you use clf.cv_results_
{'mean_fit_time': array([0.00049248, 0.00051575, 0.00051174, 0.00044131]),
'mean_score_time': array([0.0002739 , 0.00027657, 0.00023718, 0.00023627]),
'mean_test_score': array([0.98 , 0.96666667, 0.97333333, 0.98 ]),
'param_C': masked_array(data=[1, 1, 10, 10],
mask=[False, False, False, False],
fill_value='?',
dtype=object),
'param_kernel': masked_array(data=['linear', 'rbf', 'linear', 'rbf'],
mask=[False, False, False, False],
fill_value='?',
dtype=object),
'params': [{'C': 1, 'kernel': 'linear'},
{'C': 1, 'kernel': 'rbf'},
{'C': 10, 'kernel': 'linear'},
{'C': 10, 'kernel': 'rbf'}],
'rank_test_score': array([1, 4, 3, 1], dtype=int32),
'split0_test_score': array([0.96666667, 0.96666667, 1. , 0.96666667]),
'split1_test_score': array([1. , 0.96666667, 1. , 1. ]),
'split2_test_score': array([0.96666667, 0.96666667, 0.9 , 0.96666667]),
'split3_test_score': array([0.96666667, 0.93333333, 0.96666667, 0.96666667]),
'split4_test_score': array([1., 1., 1., 1.]),
'std_fit_time': array([1.84329827e-04, 1.34653950e-05, 1.26220210e-04, 1.76294378e-05]),
'std_score_time': array([6.23956317e-05, 1.34498512e-05, 3.57596078e-06, 4.68175419e-06]),
'std_test_score': array([0.01632993, 0.02108185, 0.03887301, 0.01632993])}
You can make use of the params
and the mean_test_score
for constructing the dataframe you are looking using the below command:
pd.concat([pd.DataFrame(clf.cv_results_["params"]),pd.DataFrame(clf.cv_results_["mean_test_score"], columns=["Accuracy"])],axis=1)
And your final dataframe looks like
C kernel Accuracy
0 1 linear 0.980000
1 1 rbf 0.966667
2 10 linear 0.973333
3 10 rbf 0.980000
Hope this helps!
Perhaps easier:
pd.DataFrame({'param': clf.cv_results_["params"], 'acc': clf.cv_results_["mean_test_score"]})
or:
df = pd.DataFrame(clf.cv_results_)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With