I have ratings for 60 cases by 3 raters. These are in lists organized by document - the first element refers to the rating of the first document, the second of the second document, and so on:
rater1 = [-8,-7,8,6,2,-5,...]
rater2 = [-3,-5,3,3,2,-2,...]
rater3 = [-4,-2,1,0,0,-2,...]
Is there a python implementation of Cohen's Kappa somewhere? I couldn't find anything in numpy or scipy, and nothing here on stackoverflow, but maybe I missed it? This is quite a common statistic, so I'm surprised I can't find it for a language like Python.
Cohen suggested the Kappa result be interpreted as follows: values ≤ 0 as indicating no agreement and 0.01–0.20 as none to slight, 0.21–0.40 as fair, 0.41– 0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1.00 as almost perfect agreement.
To calculate the Kappa coefficient we will take the probability of agreement minus the probability of disagreement divided by 1 minus the probability of disagreement. This is a positive value which means there is some mutual agreement between the parties.
Cohen's kappa was introduced in scikit-learn 0.17:
sklearn.metrics.cohen_kappa_score(y1, y2, labels=None, weights=None)
Example:
from sklearn.metrics import cohen_kappa_score
labeler1 = [2, 0, 2, 2, 0, 1]
labeler2 = [0, 0, 2, 2, 0, 2]
cohen_kappa_score(labeler1, labeler2)
As a reminder, from {1}:
References:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With