I should like to calculate - by bootstrapping Krippendorff's Alpha outcomes - a 95% confidence interval for Krippendorff's Alpha coefficient of Raters Reliability using R package irr.
Let's use "C data from Krippendorff" in the package irr and the R script for calculating Krippendorff's Alpha once:
# the "C" data from Krippendorff
#rater per row; rated subject per column; NAs allowed
library(irr)
nmm<-matrix(c(1,1,NA,1,2,2,3,2,3,3,3,3,3,3,3,3,2,2,2,2,1,2,3,4,4,4,4,4,
1,1,2,1,2,2,2,2,NA,5,5,5,NA,NA,1,1,NA,NA,3,NA),nrow=4)
kripp.alpha(nmm,"ordinal")
You can use the boot
function from the boot
package to bootstrap values. Here I'll bootstrap the set of subjects but keep the raters fixed:
library(boot)
library(irr)
ka <- function(data, indices) kripp.alpha(nmm[,indices], "ordinal")$value
b <- boot(seq(ncol(nmm)), ka, 1000)
Now you can use the boot.ci
function to compute a 95% confidence interval for the bootstrapped value; I'll use the percentile confidence interval, but others are available (check out ?boot.ci
):
boot.ci(b, type="perc")
# BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
# Based on 1000 bootstrap replicates
#
# CALL :
# boot.ci(boot.out = b, type = "perc")
#
# Intervals :
# Level Percentile
# 95% ( 0.4297, 1.0000 )
# Calculations and Intervals on Original Scale
Unfortunately, the bootstrap solution given by josliber doesn't do what you think it does. The problem is that boot() expects data in an nXm matrix while kripp.alpha() expects data in an mXn matrix. The solution given will run, as shown, but the resampling being done is not by subjects, but by raters, so with 4 raters in the example data set we have a small number of possible samples, with the possibility that the resampled set will come from a single rater (hence the conf interval includes 1.0).
One solution is to keep your data in the nXm form that boot uses, and add a matrix transpose before giving it to kripp.alpha().
alpha.boot <- function(data,x) {
d <- t(data[x,])
kripp.alpha(d,method="nominal")$value
}
Probably too late to be helpful for Paul, but for future reference note that none of the proposed methods is consistent with the bootstrap algorithm described by Klaus Krippendorff (http://web.asc.upenn.edu/usr/krippendorff/boot.c-Alpha.pdf).
The repeated samples do neither draw raters nor units, but "hypothetical reliability data from the matrix of observed coincidences among pairs of values assigned to units in independent replications" (Krippendorff 2016). Thus, conventional bootstrap implementations will not give the answer intended by Krippendorff.
Best Daniel
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With