Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use `log_loss` in `GridSearchCV` with multi-class labels in Scikit-Learn (sklearn)?

I'm trying to use the log_loss argument in the scoring parameter of GridSearchCV to tune this multi-class (6 classes) classifier. I don't understand how to give it a label parameter. Even if I gave it sklearn.metrics.log_loss, it would change for each iteration in the cross-validation so I don't understand how to give it the labels parameter?

I'm using Python v3.6 and Scikit-Learn v0.18.1

How can I use GridSearchCV with log_loss with multi-class model tuning?

My class representation:

1    31
2    18
3    28
4    19
5    17
6    22
Name: encoding, dtype: int64

My code:

param_test = {"criterion": ["friedman_mse", "mse", "mae"]}
gsearch_gbc = GridSearchCV(estimator = GradientBoostingClassifier(n_estimators=10), 
                        param_grid = param_test, scoring="log_loss", n_jobs=1, iid=False, cv=cv_indices)
gsearch_gbc.fit(df_attr, Se_targets)

Here's the tail end of the error and the full one is here https://pastebin.com/1CshpEBN:

ValueError: y_true contains only one label (1). Please provide the true labels explicitly through the labels argument.

UPDATE: Just use this to make the scorer based on based on @Grr

log_loss_build = lambda y: metrics.make_scorer(metrics.log_loss, greater_is_better=False, needs_proba=True, labels=sorted(np.unique(y)))
like image 324
O.rka Avatar asked Apr 12 '17 18:04

O.rka


People also ask

What is Log_loss in Python?

Log loss, aka logistic loss or cross-entropy loss. This is the loss function used in (multinomial) logistic regression and extensions of it such as neural networks, defined as the negative log-likelihood of a logistic model that returns y_pred probabilities for its training data y_true .

How do you use accuracy scores?

The Accuracy score is calculated by dividing the number of correct predictions by the total prediction number. The more formal formula is the following one. As you can see, Accuracy can be easily described using the Confusion matrix terms such as True Positive, True Negative, False Positive, and False Negative.

What is Multilabel confusion matrix?

The multilabel_confusion_matrix calculates class-wise or sample-wise multilabel confusion matrices, and in multiclass tasks, labels are binarized under a one-vs-rest way; while confusion_matrix calculates one confusion matrix for confusion between every two classes. Examples.

How do you get accuracy in Sklearn?

Accuracy using Sklearn's accuracy_score() The accuracy_score() method of sklearn. metrics, accept the true labels of the sample and the labels predicted by the model as its parameters and computes the accuracy score as a float value, which can likewise be used to obtain the accuracy score in Python.


2 Answers

my assumption is that somehow your data split has only one class label in y_true. while this seems unlikely based on the distribution you posted, i guess it is possible. While i havent run into this before it seems that in [sklearn.metrics.log_loss](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html) the label argument is expected if the labels are all the same. The wording of this section of the documentation also makes it seem as if the method imputes a binary classification if labels is not passed.

Now as you correctly assume you should pass log_loss as scorer=sklearn.metrics.log_loss(labels=your_labels)

like image 73
Grr Avatar answered Nov 14 '22 23:11

Grr


You can simply specify "neg_log_loss_scorer" (or "log_loss_scorer") in older versions which will use the negative log loss.

like image 32
Andreas Mueller Avatar answered Nov 15 '22 00:11

Andreas Mueller