Inter annotator agreement when users annotates more than one category for any subject

Question

I want to find the inter annotator agreement for few annotators. Annotators annotates few categories (out of 10 categories) for each subjects.

For e.g. there are 3 annotator , 10 categories and 100 subjects .

I am aware about http://en.wikipedia.org/wiki/Cohen's_kappa (For two annotators) and http://en.wikipedia.org/wiki/Fleiss%27_kappa (for more than two annotators) inter annotator agreement but I realized that they may not work if user annotates more than one category for any subject.

Do anyone has any idea for determining inter annotation agreement in this scenario.

Thanks

Zaw Lin · Accepted Answer

i had to do this several years back. i cant recall how exactly i did it(i dont have code anymore) but i have a worked example to report to my professor. i was dealing with annotation of comments and have 56 categories and 4 annotators.

note:at the time i need a way to detect where annotators most disagree so that after each annotation session they can focus on why they disagree and set out reasonable rules to maximize this statistic. it worked well for that purpose

Let's assume A-D are annotators and 1-5 are categories. This is a possible scenario.

     A      B      C    D     Probability of agreement
1    X      X      X    X        4/4
2    X      X      X             3/4
3    X      X                    2/4
4    X                           1/4
5 

A tags this comment as 1,2,3,4 B->1,2,3, and so forth. 

For each category the probability of agreement is calculated. 

Which is then divided by the number of unique categories tagged for that particular comment.

Therefore for the example comment, we have 10/16 as annotator's agreement. This is a value between 0 and 1.

if this doesnt work for you then (http://www.mitpressjournals.org/doi/pdf/10.1162/coli.07-034-R2) pg-567, which was referenced by pg-587 case study.

Ben Allison · Answer

Compute agreement on a per-label basis. If you treat one of the annotators as the gold standard, you can then compute recall and precision on label assignments. Another option is label overlap, which would be the proportion of subjects where either annotator assigned a category where the both assigned it (intersection over union).

Inter annotator agreement when users annotates more than one category for any subject

Tags:

annotations

machine-learning

statistics

piku

2 Answers

Zaw Lin

Ben Allison

Recent Activity

Donate For Us

Inter annotator agreement when users annotates more than one category for any subject

Tags:

annotations

machine-learning

statistics

piku

2 Answers

Zaw Lin

Ben Allison

Related questions

Recent Activity

Donate For Us