Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inter-rater agreement in Python (Cohen's Kappa)

I have ratings for 60 cases by 3 raters. These are in lists organized by document - the first element refers to the rating of the first document, the second of the second document, and so on:

rater1 = [-8,-7,8,6,2,-5,...]
rater2 = [-3,-5,3,3,2,-2,...]
rater3 = [-4,-2,1,0,0,-2,...]

Is there a python implementation of Cohen's Kappa somewhere? I couldn't find anything in numpy or scipy, and nothing here on stackoverflow, but maybe I missed it? This is quite a common statistic, so I'm surprised I can't find it for a language like Python.

like image 614
Zach Avatar asked Jul 17 '12 17:07

Zach


People also ask

What is a good kappa for inter-rater reliability?

Cohen suggested the Kappa result be interpreted as follows: values ≤ 0 as indicating no agreement and 0.01–0.20 as none to slight, 0.21–0.40 as fair, 0.41– 0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1.00 as almost perfect agreement.

How do I import my Cohen kappa score?

To calculate the Kappa coefficient we will take the probability of agreement minus the probability of disagreement divided by 1 minus the probability of disagreement. This is a positive value which means there is some mutual agreement between the parties.


1 Answers

Cohen's kappa was introduced in scikit-learn 0.17:

sklearn.metrics.cohen_kappa_score(y1, y2, labels=None, weights=None)

Example:

from sklearn.metrics import cohen_kappa_score
labeler1 = [2, 0, 2, 2, 0, 1]
labeler2 = [0, 0, 2, 2, 0, 2]
cohen_kappa_score(labeler1, labeler2)

As a reminder, from {1}:

enter image description here


References:

  • {1} Viera, Anthony J., and Joanne M. Garrett. "Understanding interobserver agreement: the kappa statistic." Fam Med 37, no. 5 (2005): 360-363. https://www.ncbi.nlm.nih.gov/pubmed/15883903:
like image 152
Franck Dernoncourt Avatar answered Sep 22 '22 02:09

Franck Dernoncourt