Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sklearn log_loss different number of classes

I'm using log_loss with sklearn

from sklearn.metrics import log_loss
print log_loss(true, pred,normalize=False)

and i have following error:

ValueError: y_true and y_pred have different number of classes 38, 2

It is really strange to me since, the arrays look valid:

print pred.shape
print np.unique(pred)
print np.unique(pred).size
(19191L,)
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37]
38

print true.shape
print np.unique(true)
print np.unique(true).size
(19191L,)
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37]
38

What is wrong with the log_loss? Why it throws the error?

Sample data:

pred: array([ 0,  1,  2, ...,  3, 12, 16], dtype=int64)
true: array([ 0,  1,  2, ...,  3, 12, 16])
like image 800
Ablomis Avatar asked Nov 09 '15 18:11

Ablomis


People also ask

What is Logloss metric?

Log-loss is indicative of how close the prediction probability is to the corresponding actual/true value (0 or 1 in case of binary classification). The more the predicted probability diverges from the actual value, the higher is the log-loss value.

What is a good log loss value?

The logloss is simply L(pi)=−log(pi) where p is simply the probability attributed to the real class. So L(p)=0 is good, we attributed the probability 1 to the right class, while L(p)=+∞ is bad, because we attributed the probability 0 to the actual class.

What is Log_loss in Python?

Log loss, aka logistic loss or cross-entropy loss. This is the loss function used in (multinomial) logistic regression and extensions of it such as neural networks, defined as the negative log-likelihood of a logistic model that returns y_pred probabilities for its training data y_true .

How do you check for lost logs in Python?

The log loss can be implemented in Python using the log_loss() function in scikit-learn. In the binary classification case, the function takes a list of true outcome values and a list of probabilities as arguments and calculates the average log loss for the predictions.


3 Answers

It's simple, you are using the prediction and not the probability of your prediction. Your pred variable contains

[ 1 2 1 3 .... ] #Classes : 1, 2 or 3

but to use log_loss it should contain something like:

 #each element is an array with probability of each class
 [[ 0.1, 0.8, 0.1] [ 0.0, 0.79 , 0.21] .... ] 

To obtain these probabilities use the function predict_proba:

pred = model.predict_proba(x_test)
eval = log_loss(y_true,pred) 
like image 77
deltascience Avatar answered Sep 30 '22 17:09

deltascience


Inside the log_loss method, true array is fit and transformed by a LabelBinarizer which changes its dimensions. So, the check that true and pred have similar dimensions doesn't mean that log_loss method will work because true's dimensions change. If you just have binary classes, I suggest you use this log_loss cost function else for multiple classes, this method doesn't work.

like image 43
Hima Varsha Avatar answered Sep 30 '22 18:09

Hima Varsha


From the log_loss documentation:

y_pred : array-like of float, shape = (n_samples, n_classes) or (n_samples,)

Predicted probabilities, as returned by a classifier’s predict_proba method. If y_pred.shape = (n_samples,) the probabilities provided are assumed to be that of the positive class. The labels in y_pred are assumed to be ordered alphabetically, as done by preprocessing.LabelBinarizer.

You need to pass probabilities not the prediction labels.

like image 30
ug2409 Avatar answered Sep 30 '22 19:09

ug2409