I ran Random Forest classifier for my multi-class multi-label output variable. I got below output.
My y_test values
Degree Nature
762721 1 7
548912 0 6
727126 1 12
14880 1 12
189505 1 12
657486 1 12
461004 1 0
31548 0 6
296674 1 7
121330 0 17
predicted output :
[[ 1. 7.]
[ 0. 6.]
[ 1. 12.]
[ 1. 12.]
[ 1. 12.]
[ 1. 12.]
[ 1. 0.]
[ 0. 6.]
[ 1. 7.]
[ 0. 17.]]
Now I want to check the performance of my classifier. I found that for multiclass multilabel "Hamming loss or jaccard_similarity_score" is the good metrics. I tried to calculate it but I was getting value error.
Error:
ValueError: multiclass-multioutput is not supported
Below line I tried:
print hamming_loss(y_test, RF_predicted)
print jaccard_similarity_score(y_test, RF_predicted)
Thanks,
Accuracy is simply the number of correct predictions divided by the total number of examples. If we consider that a prediction is correct if and only if the predicted binary vector is equal to the ground-truth binary vector, then our model would have an accuracy of 1 / 4 = 0.25 = 25%.
To calculate accuracy, use the following formula: (TP+TN)/(TP+TN+FP+FN). Misclassification Rate: It tells you what fraction of predictions were incorrect. It is also known as Classification Error. You can calculate it using (FP+FN)/(TP+TN+FP+FN) or (1-Accuracy).
For a multilabel classification, we compute the number of False Positives and False Negative per instance and then average it over the total number of training instances.
In Python, the accuracy_score function of the sklearn. metrics package calculates the accuracy score for a set of predicted labels against the true labels.
To calculate the unsupported hamming loss for multiclass / multilabel, you could:
import numpy as np
y_true = np.array([[1, 1], [2, 3]])
y_pred = np.array([[0, 1], [1, 2]])
np.sum(np.not_equal(y_true, y_pred))/float(y_true.size)
0.75
You can also get the confusion_matrix
for each of the two labels like so:
from sklearn.metrics import confusion_matrix, precision_score
np.random.seed(42)
y_true = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).T
[[0 4]
[1 4]
[0 4]
[0 4]
[0 2]
[1 4]
[0 3]
[0 2]
[0 3]
[1 3]]
y_pred = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).T
[[1 2]
[1 2]
[1 4]
[1 4]
[0 4]
[0 3]
[1 4]
[1 3]
[1 3]
[0 4]]
confusion_matrix(y_true[:, 0], y_pred[:, 0])
[[1 6]
[2 1]]
confusion_matrix(y_true[:, 1], y_pred[:, 1])
[[0 1 1]
[0 1 2]
[2 1 2]]
You could also calculate the precision_score
like so (or the recall_score
in a similiar way):
precision_score(y_true[:, 0], y_pred[:, 0])
0.142857142857
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With