I'm trying to use the log_loss
argument in the scoring
parameter of GridSearchCV
to tune this multi-class (6 classes) classifier. I don't understand how to give it a label
parameter. Even if I gave it sklearn.metrics.log_loss
, it would change for each iteration in the cross-validation so I don't understand how to give it the labels
parameter?
I'm using Python v3.6
and Scikit-Learn v0.18.1
How can I use GridSearchCV
with log_loss
with multi-class model tuning?
My class representation:
1 31
2 18
3 28
4 19
5 17
6 22
Name: encoding, dtype: int64
My code:
param_test = {"criterion": ["friedman_mse", "mse", "mae"]}
gsearch_gbc = GridSearchCV(estimator = GradientBoostingClassifier(n_estimators=10),
param_grid = param_test, scoring="log_loss", n_jobs=1, iid=False, cv=cv_indices)
gsearch_gbc.fit(df_attr, Se_targets)
Here's the tail end of the error and the full one is here https://pastebin.com/1CshpEBN:
ValueError: y_true contains only one label (1). Please provide the true labels explicitly through the labels argument.
UPDATE: Just use this to make the scorer based on based on @Grr
log_loss_build = lambda y: metrics.make_scorer(metrics.log_loss, greater_is_better=False, needs_proba=True, labels=sorted(np.unique(y)))
Log loss, aka logistic loss or cross-entropy loss. This is the loss function used in (multinomial) logistic regression and extensions of it such as neural networks, defined as the negative log-likelihood of a logistic model that returns y_pred probabilities for its training data y_true .
The Accuracy score is calculated by dividing the number of correct predictions by the total prediction number. The more formal formula is the following one. As you can see, Accuracy can be easily described using the Confusion matrix terms such as True Positive, True Negative, False Positive, and False Negative.
The multilabel_confusion_matrix calculates class-wise or sample-wise multilabel confusion matrices, and in multiclass tasks, labels are binarized under a one-vs-rest way; while confusion_matrix calculates one confusion matrix for confusion between every two classes. Examples.
Accuracy using Sklearn's accuracy_score() The accuracy_score() method of sklearn. metrics, accept the true labels of the sample and the labels predicted by the model as its parameters and computes the accuracy score as a float value, which can likewise be used to obtain the accuracy score in Python.
my assumption is that somehow your data split has only one class label in y_true. while this seems unlikely based on the distribution you posted, i guess it is possible. While i havent run into this before it seems that in [sklearn.metrics.log_loss
](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html) the label argument is expected if the labels are all the same. The wording of this section of the documentation also makes it seem as if the method imputes a binary classification if labels
is not passed.
Now as you correctly assume you should pass log_loss
as scorer=sklearn.metrics.log_loss(labels=your_labels)
You can simply specify "neg_log_loss_scorer" (or "log_loss_scorer") in older versions which will use the negative log loss.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With