How to do GridSearchCV for F1-score in classification problem with scikit-learn?

Tags:

I'm working on a multi classification problem with a neural network in scikit-learn and I'm trying to figure out how I can optimize my hyperparameters (amount of layers, perceptrons, other things eventually).

I found out that GridSearchCV is the way to do it but the code that I'm using returns me the average accuracy while I actually want to test on the F1-score. Does anyone have an idea about how I can edit this code to make it work for the F1-score?

In the beginning when I had to evaluate the precision/accuracy I thought it was 'enough' to just take the confusion matrix and make a conclusion out of it, while doing trial-and-error changing the amount of layers and perceptrons in my neural network again and again.

Today I figured out that there's more than that: GridSearchCV. I just need to figure out how i can evaluate the F1-score because I need to do a research on determining the accuracy from the neural network in terms of the layers, nodes, and eventually other alternatives...

mlp = MLPClassifier(max_iter=600)
clf = GridSearchCV(mlp, parameter_space, n_jobs= -1, cv = 3)
clf.fit(X_train, y_train.values.ravel())

parameter_space = {
    'hidden_layer_sizes': [(1), (2), (3)],
}

print('Best parameters found:\n', clf.best_params_)

means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r" % (mean, std * 2, params))

output:

Best parameters found:
 {'hidden_layer_sizes': 3}
0.842 (+/-0.089) for {'hidden_layer_sizes': 1}
0.882 (+/-0.031) for {'hidden_layer_sizes': 2}
0.922 (+/-0.059) for {'hidden_layer_sizes': 3}

So here my output gives me the mean accuracy (which I found is default on GridSearchCV). How can I change this to return the average F1-score instead of accuracy?

496

asked May 10 '19 20:05

Jonas

1 Answers

You can create your own metric function with make_scorer. In this case, you can use sklearn's f1_score, but you can use your own if you prefer:

from sklearn.metrics import f1_score, make_scorer

f1 = make_scorer(f1_score , average='macro')

Once you have made your scorer, you can plug it directly inside the grid creation as scoring parameter:

clf = GridSearchCV(mlp, parameter_space, n_jobs= -1, cv = 3, scoring=f1)

On the other hand, I've used average='macro' as f1 multi-class parameter. This calculates the metrics for each label, and then finds their unweighted mean. But there are other options in order to compute f1 with multiple labels. You can find them here

Note: answer completely edited for better understanding

answered Oct 13 '22 00:10

Haritz Laboa

Related questions
                            
                                Maximize number of parallel requests (aiohttp)
                            
                                Is there something like a reverse eval() function?
                            
                                Twitter premium not authorized
                            
                                How do I ignore an alert using selenium + chrome webdriver + python?
                            
                                How to get a list of Bokeh widget events and attributes (which can be used to trigger a Python callback)
                            
                                Batch Normalization in tf.keras does not calculate average mean and average variance
                            
                                Flask Marshmallow JSON fields
                            
                                Python-How to have the axis ticks in both top and bottom, left and right of sns.heatmap
                            
                                GridSearchCV has no attribute grid.grid_scores_
                            
                                Change Django autocomplete_fields label
                            
                                How to judge if a polygon is inside another polygon in Python?
                            
                                Is it possible to add TransformedTargetRegressor into a scikit-learn pipeline?
                            
                                AttributeError: 'ShuffleDataset' object has no attribute 'output_shapes' - when following TF tutorial
                            
                                How to convert a python script with cell delimiters to a jupyter notebook? [duplicate]
                            
                                Azure sharepoint multi-factor authentication with python
                            
                                Give a individual zorder value to every marker in a matplotlib scatter plot
                            
                                Change image size with PIL in a Google Cloud Storage Bucket (from a VM in GCloud)
                            
                                How to return two variables from a python function and access its values without calling it two times?
                            
                                Support unknown values in Python Enums
                            
                                Why am I getting different errors when trying to read s3 key that does not exist

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to do GridSearchCV for F1-score in classification problem with scikit-learn?

Tags:

python

machine-learning

neural-network

multilabel-classification

hyperparameters

Jonas

People also ask

1 Answers

Haritz Laboa

Recent Activity

Donate For Us