Hello I am working with sklearn and in order to understand better the metrics, I followed the following example of precision_score:
from sklearn.metrics import precision_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
print(precision_score(y_true, y_pred, average='macro'))
the result that i got was the following:
0.222222222222
I understand that sklearn compute that result following these steps:
and finally sklearn calculates mean precision by all three labels: precision = (0.66 + 0 + 0) / 3 = 0.22
this result is given if we take this parameters:
precision_score(y_true, y_pred, average='macro')
on the other hand if we take this parameters, changing average='micro' :
precision_score(y_true, y_pred, average='micro')
then we get:
0.33
and if we take average='weighted':
precision_score(y_true, y_pred, average='weighted')
then we obtain:
0.22.
I don't understand well how sklearn computes this metric when the average parameter is set to 'weighted' or 'micro', I really would like to appreciate if someone could give me a clear explanation of this.
Accuracy using Sklearn's accuracy_score()The accuracy_score() method of sklearn. metrics, accept the true labels of the sample and the labels predicted by the model as its parameters and computes the accuracy score as a float value, which can likewise be used to obtain the accuracy score in Python.
The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.
'micro'
:
Calculate metrics globally by considering each element of the label indicator matrix as a label.
'macro'
:
Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
'weighted'
:
Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label).
'samples'
:
Calculate metrics for each instance, and find their average.
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html
For Support measures: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
Basically, class membership.
3.3.2.12. Receiver operating characteristic (ROC)
The function roc_curve computes the receiver operating characteristic curve, or ROC curve. Quoting Wikipedia :
“A receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied. It is created by plotting the fraction of true positives out of the positives (TPR = true positive rate) vs. the fraction of false positives out of the negatives (FPR = false positive rate), at various threshold settings. TPR is also known as sensitivity, and FPR is one minus the specificity or true negative rate.”
TN / True Negative: case was negative and predicted negative.
TP / True Positive: case was positive and predicted positive.
FN / False Negative: case was positive but predicted negative.
FP / False Positive: case was negative but predicted positive# Basic terminology
confusion = metrics.confusion_matrix(expected, predicted)
print confusion,"\n"
TN, FP = confusion[0, 0], confusion[0, 1]
FN, TP = confusion[1, 0], confusion[1, 1]
print 'Specificity: ', round(TN / float(TN + FP),3)*100, "\n"
print 'Sensitivity: ', round(TP / float(TP + FN),3)*100, "(Recall)"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With