I have many data sets that are inputs to a function. The data is stored in a data table, and I'm calculating confidence intervals for my function output. However, there are some cases when all of the input data is the same, resulting in an error: "All values of x are equal to 100 \n Cannot calculate confidence intervals" How can I avoid this error (e.g., just set the confidence interval to an arbitrary value like 0 or NA for the case when all values are equal)? For example:
library(boot)
library(data.table)
problem=1
data<-data.table(column1=c(1:100),column2=c(rep(100,99),problem))
resample.number=1000
confidence=0.95
sample.mean<-function(indata,x){mean(indata[x])}
boot_obj<-lapply(data,boot,statistic = sample.mean,R = resample.number)
boot.mean.f<-function(x,column){
x[column][1]
}
means<-data.table(sapply(boot_obj,boot.mean.f))
bootci_obj<-lapply(boot_obj,boot.ci, conf = confidence, type = "perc")
bootci.f<-function(x,column){
x<-x[column][4]
x<-unlist(strsplit(as.character(x[1]),","))
x<-sub("[:punct:].*","",x)
x<-sub("lis.*","",x)
x<-sub(").?","",x)
x<-na.omit(as.numeric(x))
}
cis<-data.table(t(sapply(bootci_obj,bootci.f)))
setnames(means,"V1","stat")
cis[,V1:=NULL]
cis[,V2:=NULL]
setnames(cis,c("V3","V4"),c("lci","uci"))
return(cbind(means,cis))
returns:
stat lci uci
1: 50.5 44.96025 56.26797
2: 99.01 97.03000 100.00000
Changing
problem=1
returns:"All values of t are equal to 100 \n Cannot calculate confidence intervals" which leads to other errors.
I would like the result to be:
stat lci uci
1: 50.5 44.96025 56.26797
2: 100.0 0.0000 0.00000
I stacked the data.table, because it's much more efficient to work with a data.table in long format. I also prefer to set the confidence limits to the same value as the mean, if all values are equal. Adjust as you like.
library(boot)
library(data.table)
DT <- data.table(column1=1:100,column2=rep(100,100))
DT <- data.table(stack(DT))
resample.number=1000
confidence=0.95
sample.mean <- function(indata,x){mean(indata[x])}
ci.mean <- function(x, resample.number,confidence) {
if(length(unique(x)) > 1) {
temp <- boot.ci(boot(x,statistic = sample.mean,R = resample.number), conf = confidence, type = "perc")$percent
list(mean=mean(x),lwr=temp[,4],upr=temp[,5])
} else {
list(mean=mean(x),lwr=mean(x),upr=mean(x)
}
}
set.seed(42)
DT[,ci.mean(values,resample.number,confidence),by=ind]
# ind mean lwr upr
#1: column1 50.5 44.92305 55.93949
#2: column2 100.0 100.00000 100.00000
Note that boot.ci just gives a warning and returns NA values, if all values are equal. There is no error and if you can work with NAs, there is no need for the if condition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With