I need to calculate the within and between run variances from some data as part of developing a new analytical chemistry method. I also need confidence intervals from this data using the R language I assume I need to use anova or something ? My data is like <pre class="prettyprint"><code>> variance Run Rep Value 1 1 1 9.85 2 1 2 9.95 3 1 3 10.00 4 2 1 9.90 5 2 2 8.80 6 2 3 9.50 7 3 1 11.20 8 3 2 11.10 9 3 3 9.80 10 4 1 9.70 11 4 2 10.10 12 4 3 10.00 </code></pre>

You have four groups of three observations: <pre class="prettyprint"><code>> run1 = c(9.85, 9.95, 10.00) > run2 = c(9.90, 8.80, 9.50) > run3 = c(11.20, 11.10, 9.80) > run4 = c(9.70, 10.10, 10.00) > runs = c(run1, run2, run3, run4) > runs [1] 9.85 9.95 10.00 9.90 8.80 9.50 11.20 11.10 9.80 9.70 10.10 10.00 </code></pre> Make some labels: <pre class="prettyprint"><code>> n = rep(3, 4) > group = rep(1:4, n) > group [1] 1 1 1 2 2 2 3 3 3 4 4 4 </code></pre> Calculate within-run stats: <pre class="prettyprint"><code>> withinRunStats = function(x) c(sum = sum(x), mean = mean(x), var = var(x), n = length(x)) > tapply(runs, group, withinRunStats) $`1` sum mean var n 29.800000000 9.933333333 0.005833333 3.000000000 $`2` sum mean var n 28.20 9.40 0.31 3.00 $`3` sum mean var n 32.10 10.70 0.61 3.00 $`4` sum mean var n 29.80000000 9.93333333 0.04333333 3.00000000 </code></pre> You can do some ANOVA here: <pre class="prettyprint"><code>> data = data.frame(y = runs, group = factor(group)) > data y group 1 9.85 1 2 9.95 1 3 10.00 1 4 9.90 2 5 8.80 2 6 9.50 2 7 11.20 3 8 11.10 3 9 9.80 3 10 9.70 4 11 10.10 4 12 10.00 4 > fit = lm(runs ~ group, data) > fit Call: lm(formula = runs ~ group, data = data) Coefficients: (Intercept) group2 group3 group4 9.933e+00 -5.333e-01 7.667e-01 -2.448e-15 > anova(fit) Analysis of Variance Table Response: runs Df Sum Sq Mean Sq F value Pr(>F) group 3 2.57583 0.85861 3.5437 0.06769 . Residuals 8 1.93833 0.24229 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > degreesOfFreedom = anova(fit)[, "Df"] > names(degreesOfFreedom) = c("treatment", "error") > degreesOfFreedom treatment error 3 8 </code></pre> Error or within-group variance: <pre class="prettyprint"><code>> anova(fit)["Residuals", "Mean Sq"] [1] 0.2422917 </code></pre> Treatment or between-group variance: <pre class="prettyprint"><code>> anova(fit)["group", "Mean Sq"] [1] 0.8586111 </code></pre> This should give you enough confidence to do confidence intervals.

If you want to apply a function (such as <code>var</code>) across a factor such as <code>Run</code> or <code>Rep</code>, you can use <code>tapply</code>: <pre class="prettyprint"><code>> with(variance, tapply(Value, Run, var)) 1 2 3 4 0.005833333 0.310000000 0.610000000 0.043333333 > with(variance, tapply(Value, Rep, var)) 1 2 3 0.48562500 0.88729167 0.05583333 </code></pre>

I've been looking at a similar problem. I've found reference to caluclating confidence intervals by Burdick and Graybill (Burdick, R. and Graybill, F. 1992, Confidence Intervals on variance components, CRC Press) Using some code I've been trying I get these values <pre class="prettyprint"><code> > kiaraov = aov(Value~Run+Error(Run),data=kiar) > summary(kiaraov) Error: Run Df Sum Sq Mean Sq Run 3 2.57583 0.85861 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Residuals 8 1.93833 0.24229 > confint = 95 > a = (1-(confint/100))/2 > grandmean = as.vector(kiaraov$"(Intercept)"[[1]][1]) # Grand Mean (I think) > within = summary(kiaraov)$"Error: Within"[[1]]$"Mean Sq" # S2^2Mean Square Value for Within Run > dfRun = summary(kiaraov)$"Error: Run"[[1]]$"Df" > dfWithin = summary(kiaraov)$"Error: Within"[[1]]$"Df" > Run = summary(kiaraov)$"Error: Run"[[1]]$"Mean Sq" # S1^2Mean Square for between Run > between = (Run-within)/((dfWithin/(dfRun+1))+1) # (S1^2-S2^2)/J > total = between+within > between # Between Run Variance [1] 0.2054398 > within # Within Run Variance [1] 0.2422917 > total # Total Variance [1] 0.4477315 > betweenCV = sqrt(between)/grandmean * 100 # Between Run CV% > withinCV = sqrt(within)/grandmean * 100 # Within Run CV% > totalCV = sqrt(total)/grandmean * 100 # Total CV% > #within confidence intervals > withinLCB = within/qf(1-a,8,Inf) # Within LCB > withinUCB = within/qf(a,8,Inf) # Within UCB > #Between Confidence Intervals > n1 = dfRun > n2 = dfWithin > G1 = 1-(1/qf(1-a,n1,Inf)) # According to Burdick and Graybill this should be a > G2 = 1-(1/qf(1-a,n2,Inf)) > H1 = (1/qf(a,n1,Inf))-1 # and this should be 1-a, but my results don't agree > H2 = (1/qf(a,n2,Inf))-1 > G12 = ((qf(1-a,n1,n2)-1)^2-(G1^2*qf(1-a,n1,n2)^2)-(H2^2))/qf(1-a,n1,n2) # again, should be a, not 1-a > H12 = ((1-qf(a,n1,n2))^2-H1^2*qf(a,n1,n2)^2-G2^2)/qf(a,n1,n2) # again, should be 1-a, not a > Vu = H1^2*Run^2+G2^2*within^2+H12*Run*within > Vl = G1^2*Run^2+H2^2*within^2+G12*within*Run > betweenLCB = (Run-within-sqrt(Vl))/J # Betwen LCB > betweenUCB = (Run-within+sqrt(Vu))/J # Between UCB > #Total Confidence Intervals > y = (Run+(J-1)*within)/J > totalLCB = y-(sqrt(G1^2*Run^2+G2^2*(J-1)^2*within^2)/J) # Total LCB > totalUCB = y+(sqrt(H1^2*Run^2+H2^2*(J-1)^2*within^2)/J) # Total UCB > result = data.frame(Name=c("within", "between", "total"),CV=c(withinCV,betweenCV,totalCV),LCB=c(sqrt(withinLCB)/grandmean*100,sqrt(betweenLCB)/grandmean*100,sqrt(totalLCB)/grandmean*100),UCB=c(sqrt(withinUCB)/grandmean*100,sqrt(betweenUCB)/grandmean*100,sqrt(totalUCB)/grandmean*100)) > result Name CV LCB UCB 1 within 4.926418 3.327584 9.43789 2 between 4.536327 NaN 19.73568 3 total 6.696855 4.846030 20.42647 </code></pre> Here the lower confidence interval for between run CV is less than zero, so reported as NaN. I'd love to have a better way to do this. If I get time I might try to create a function to do this. Paul. -- Edit: I did eventually write a function, here it is (caveat emptor) <pre class="prettyprint"><code>#' avar Function #' #' Calculate thewithin, between and total %CV of a dataset by ANOVA, and the #' associated confidence intervals #' #' @param dataf - The data frame to use, in long format #' @param afactor Character string representing the column in dataf that contains the factor #' @param aresponse Charactyer string representing the column in dataf that contains the response value #' @param aconfidence What Confidence limits to use, default = 95% #' @param digits Significant Digits to report to, default = 3 #' @param debug Boolean, Should debug messages be displayed, default=FALSE #' @returnType dataframe containing the Mean, Within, Between and Total %CV and LCB and UCB for each #' @return #' @author Paul Hurley #' @export #' @examples #' #Using the BGBottles data from Burdick and Graybill Page 62 #' assayvar(dataf=BGBottles, afactor="Machine", aresponse="weight") avar<-function(dataf, afactor, aresponse, aconfidence=95, digits=3, debug=FALSE){ dataf<-subset(dataf,!is.na(with(dataf,get(aresponse)))) nmissing<-function(x) sum(!is.na(x)) n<-nrow(subset(dataf,is.numeric(with(dataf,get(aresponse))))) datadesc<-ddply(dataf, afactor, colwise(nmissing,aresponse)) I<-nrow(datadesc) if(debug){print(datadesc)} if(min(datadesc[,2])==max(datadesc[,2])){ balance<-TRUE J<-min(datadesc[,2]) if(debug){message(paste("Dataset is balanced, J=",J,"I is ",I,sep=""))} } else { balance<-FALSE Jh<-I/(sum(1/datadesc[,2], na.rm = TRUE)) J<-Jh m<-min(datadesc[,2]) M<-max(datadesc[,2]) if(debug){message(paste("Dataset is unbalanced, like me, I is ",I,sep=""))} if(debug){message(paste("Jh is ",Jh, ", m is ",m, ", M is ",M, sep=""))} } if(debug){message(paste("Call afactor=",afactor,", aresponse=",aresponse,sep=""))} formulatext<-paste(as.character(aresponse)," ~ 1 + Error(",as.character(afactor),")",sep="") if(debug){message(paste("formula text is ",formulatext,sep=""))} aovformula<-formula(formulatext) if(debug){message(paste("Formula is ",as.character(aovformula),sep=""))} assayaov<-aov(formula=aovformula,data=dataf) if(debug){ print(assayaov) print(summary(assayaov)) } a<-1-((1-(aconfidence/100))/2) if(debug){message(paste("confidence is ",aconfidence,", alpha is ",a,sep=""))} grandmean<-as.vector(assayaov$"(Intercept)"[[1]][1]) # Grand Mean (I think) if(debug){message(paste("n is",n,sep=""))} #This line commented out, seems to choke with an aov object built from an external formula #grandmean<-as.vector(model.tables(assayaov,type="means")[[1]]$`Grand mean`) # Grand Mean (I think) within<-summary(assayaov)[[2]][[1]]$"Mean Sq" # d2e, S2^2 Mean Square Value for Within Machine = 0.1819 dfRun<-summary(assayaov)[[1]][[1]]$"Df" # DF for within = 3 dfWithin<-summary(assayaov)[[2]][[1]]$"Df" # DF for within = 8 Run<-summary(assayaov)[[1]][[1]]$"Mean Sq" # S1^2Mean Square for Machine if(debug){message(paste("mean square for Run ?",Run,sep=""))} #Was between<-(Run-within)/((dfWithin/(dfRun+1))+1) but my comment suggests this should be just J, so I'll use J ! between<-(Run-within)/J # d2a (S1^2-S2^2)/J if(debug){message(paste("S1^2 mean square machine is ",Run,", S2^2 mean square within is ",within))} total<-between+within between # Between Run Variance within # Within Run Variance total # Total Variance if(debug){message(paste("between is ",between,", within is ",within,", Total is ",total,sep=""))} betweenCV<-sqrt(between)/grandmean * 100 # Between Run CV% withinCV<-sqrt(within)/grandmean * 100 # Within Run CV% totalCV<-sqrt(total)/grandmean * 100 # Total CV% n1<-dfRun n2<-dfWithin if(debug){message(paste("n1 is ",n1,", n2 is ",n2,sep=""))} #within confidence intervals if(balance){ withinLCB<-within/qf(a,n2,Inf) # Within LCB withinUCB<-within/qf(1-a,n2,Inf) # Within UCB } else { withinLCB<-within/qf(a,n2,Inf) # Within LCB withinUCB<-within/qf(1-a,n2,Inf) # Within UCB } #Mean Confidence Intervals if(debug){message(paste(grandmean,"+/-(sqrt(",Run,"/",n,")*qt(",a,",df=",I-1,"))",sep=""))} meanLCB<-grandmean+(sqrt(Run/n)*qt(1-a,df=I-1)) # wrong meanUCB<-grandmean-(sqrt(Run/n)*qt(1-a,df=I-1)) # wrong if(debug){message(paste("Grandmean is ",grandmean,", meanLCB = ",meanLCB,", meanUCB = ",meanUCB,aresponse,sep=""))} if(debug){print(summary(assayaov))} #Between Confidence Intervals G1<-1-(1/qf(a,n1,Inf)) G2<-1-(1/qf(a,n2,Inf)) H1<-(1/qf(1-a,n1,Inf))-1 H2<-(1/qf(1-a,n2,Inf))-1 G12<-((qf(a,n1,n2)-1)^2-(G1^2*qf(a,n1,n2)^2)-(H2^2))/qf(a,n1,n2) H12<-((1-qf(1-a,n1,n2))^2-H1^2*qf(1-a,n1,n2)^2-G2^2)/qf(1-a,n1,n2) if(debug){message(paste("G1 is ",G1,", G2 is ",G2,sep="")) message(paste("H1 is ",H1,", H2 is ",H2,sep="")) message(paste("G12 is ",G12,", H12 is ",H12,sep="")) } if(balance){ Vu<-H1^2*Run^2+G2^2*within^2+H12*Run*within Vl<-G1^2*Run^2+H2^2*within^2+G12*within*Run betweenLCB<-(Run-within-sqrt(Vl))/J # Betwen LCB betweenUCB<-(Run-within+sqrt(Vu))/J # Between UCB } else { #Burdick and Graybill seem to suggest calculating anova of mean values to find n1S12u/Jh meandataf<-ddply(.data=dataf,.variable=afactor, .fun=function(df){mean(with(df, get(aresponse)), na.rm=TRUE)}) meandataaov<-aov(formula(paste("V1~",afactor,sep="")), data=meandataf) sumsquare<-summary(meandataaov)[[1]]$`Sum Sq` #so maybe S12u is just that bit ? Runu<-(sumsquare*Jh)/n1 if(debug){message(paste("n1S12u/Jh is ",sumsquare,", so S12u is ",Runu,sep=""))} Vu<-H1^2*Runu^2+G2^2*within^2+H12*Runu*within Vl<-G1^2*Runu^2+H2^2*within^2+G12*within*Runu betweenLCB<-(Runu-within-sqrt(Vl))/Jh # Betwen LCB betweenUCB<-(Runu-within+sqrt(Vu))/Jh # Between UCB if(debug){message(paste("betweenLCB is ",betweenLCB,", between UCB is ",betweenUCB,sep=""))} } #Total Confidence Intervals if(balance){ y<-(Run+(J-1)*within)/J if(debug){message(paste("y is ",y,sep=""))} totalLCB<-y-(sqrt(G1^2*Run^2+G2^2*(J-1)^2*within^2)/J) # Total LCB totalUCB<-y+(sqrt(H1^2*Run^2+H2^2*(J-1)^2*within^2)/J) # Total UCB } else { y<-(Runu+(Jh-1)*within)/Jh if(debug){message(paste("y is ",y,sep=""))} totalLCB<-y-(sqrt(G1^2*Runu^2+G2^2*(Jh-1)^2*within^2)/Jh) # Total LCB totalUCB<-y+(sqrt(H1^2*Runu^2+H2^2*(Jh-1)^2*within^2)/Jh) # Total UCB } if(debug){message(paste("totalLCB is ",totalLCB,", total UCB is ",totalUCB,sep=""))} # result<-data.frame(Name=c("within", "between", "total"),CV=c(withinCV,betweenCV,totalCV), # LCB=c(sqrt(withinLCB)/grandmean*100,sqrt(betweenLCB)/grandmean*100,sqrt(totalLCB)/grandmean*100), # UCB=c(sqrt(withinUCB)/grandmean*100,sqrt(betweenUCB)/grandmean*100,sqrt(totalUCB)/grandmean*100)) result<-data.frame(Mean=grandmean,MeanLCB=meanLCB, MeanUCB=meanUCB, Within=withinCV,WithinLCB=sqrt(withinLCB)/grandmean*100, WithinUCB=sqrt(withinUCB)/grandmean*100, Between=betweenCV, BetweenLCB=sqrt(betweenLCB)/grandmean*100, BetweenUCB=sqrt(betweenUCB)/grandmean*100, Total=totalCV, TotalLCB=sqrt(totalLCB)/grandmean*100, TotalUCB=sqrt(totalUCB)/grandmean*100) if(!digits=="NA"){ result$Mean<-signif(result$Mean,digits=digits) result$MeanLCB<-signif(result$MeanLCB,digits=digits) result$MeanUCB<-signif(result$MeanUCB,digits=digits) result$Within<-signif(result$Within,digits=digits) result$WithinLCB<-signif(result$WithinLCB,digits=digits) result$WithinUCB<-signif(result$WithinUCB,digits=digits) result$Between<-signif(result$Between,digits=digits) result$BetweenLCB<-signif(result$BetweenLCB,digits=digits) result$BetweenUCB<-signif(result$BetweenUCB,digits=digits) result$Total<-signif(result$Total,digits=digits) result$TotalLCB<-signif(result$TotalLCB,digits=digits) result$TotalUCB<-signif(result$TotalUCB,digits=digits) } return(result) } assayvar<-function(adata, aresponse, afactor, anominal, aconfidence=95, digits=3, debug=FALSE){ result<-ddply(adata,anominal,function(df){ resul<-avar(dataf=df,afactor=afactor,aresponse=aresponse,aconfidence=aconfidence, digits=digits, debug=debug) resul$n<-nrow(subset(df, !is.na(with(df, get(aresponse))))) return(resul) }) return(result) } </code></pre>

Calculate within and between variances and confidence intervals in R

Tags:

r

statistics

I need to calculate the within and between run variances from some data as part of developing a new analytical chemistry method. I also need confidence intervals from this data using the R language

I assume I need to use anova or something ?

My data is like

> variance
   Run Rep Value
1    1   1  9.85
2    1   2  9.95
3    1   3 10.00
4    2   1  9.90
5    2   2  8.80
6    2   3  9.50
7    3   1 11.20
8    3   2 11.10
9    3   3  9.80
10   4   1  9.70
11   4   2 10.10
12   4   3 10.00

916

asked Sep 09 '09 20:09

PaulHurleyuk

4 Answers

You have four groups of three observations:

> run1 = c(9.85, 9.95, 10.00)
> run2 = c(9.90, 8.80, 9.50)
> run3 = c(11.20, 11.10, 9.80)
> run4 = c(9.70, 10.10, 10.00)
> runs = c(run1, run2, run3, run4)
> runs
 [1]  9.85  9.95 10.00  9.90  8.80  9.50 11.20 11.10  9.80  9.70 10.10 10.00

Make some labels:

> n = rep(3, 4)
> group = rep(1:4, n)
> group
 [1] 1 1 1 2 2 2 3 3 3 4 4 4

Calculate within-run stats:

> withinRunStats = function(x) c(sum = sum(x), mean = mean(x), var = var(x), n = length(x))
> tapply(runs, group, withinRunStats)
$`1`
         sum         mean          var            n 
29.800000000  9.933333333  0.005833333  3.000000000 

$`2`
  sum  mean   var     n 
28.20  9.40  0.31  3.00 

$`3`
  sum  mean   var     n 
32.10 10.70  0.61  3.00 

$`4`
        sum        mean         var           n 
29.80000000  9.93333333  0.04333333  3.00000000

You can do some ANOVA here:

> data = data.frame(y = runs, group = factor(group))
> data
       y group
1   9.85     1
2   9.95     1
3  10.00     1
4   9.90     2
5   8.80     2
6   9.50     2
7  11.20     3
8  11.10     3
9   9.80     3
10  9.70     4
11 10.10     4
12 10.00     4

> fit = lm(runs ~ group, data)
> fit

Call:
lm(formula = runs ~ group, data = data)

Coefficients:
(Intercept)       group2       group3       group4  
  9.933e+00   -5.333e-01    7.667e-01   -2.448e-15 

> anova(fit)
Analysis of Variance Table

Response: runs
          Df  Sum Sq Mean Sq F value  Pr(>F)  
group      3 2.57583 0.85861  3.5437 0.06769 .
Residuals  8 1.93833 0.24229                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> degreesOfFreedom = anova(fit)[, "Df"]
> names(degreesOfFreedom) = c("treatment", "error")
> degreesOfFreedom
treatment     error 
        3         8

Error or within-group variance:

> anova(fit)["Residuals", "Mean Sq"]
[1] 0.2422917

Treatment or between-group variance:

> anova(fit)["group", "Mean Sq"]
[1] 0.8586111

This should give you enough confidence to do confidence intervals.

answered Oct 05 '22 06:10

Alex Reynolds

If you want to apply a function (such as var) across a factor such as Run or Rep, you can use tapply:

> with(variance, tapply(Value, Run, var))
          1           2           3           4 
0.005833333 0.310000000 0.610000000 0.043333333 
> with(variance, tapply(Value, Rep, var))
          1          2          3 
0.48562500 0.88729167 0.05583333

answered Oct 05 '22 06:10

Jonathan Chang

I'm going to take a crack at this when I have more time, but meanwhile, here's the dput() for Kiar's data structure:

structure(list(Run = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4), Rep = c(1, 
2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3), Value = c(9.85, 9.95, 10, 9.9, 
8.8, 9.5, 11.2, 11.1, 9.8, 9.7, 10.1, 10)), .Names = c("Run", 
"Rep", "Value"), row.names = c(NA, -12L), class = "data.frame")

... in case you'd like to take a quick shot at it.

answered Oct 05 '22 08:10

Matt Parker

I've been looking at a similar problem. I've found reference to caluclating confidence intervals by Burdick and Graybill (Burdick, R. and Graybill, F. 1992, Confidence Intervals on variance components, CRC Press)

Using some code I've been trying I get these values



> kiaraov = aov(Value~Run+Error(Run),data=kiar)

> summary(kiaraov)

Error: Run
    Df  Sum Sq Mean Sq
Run  3 2.57583 0.85861

Error: Within
          Df  Sum Sq Mean Sq F value Pr(>F)
Residuals  8 1.93833 0.24229               
> confint = 95

> a = (1-(confint/100))/2

> grandmean = as.vector(kiaraov$"(Intercept)"[[1]][1]) # Grand Mean (I think)

> within = summary(kiaraov)$"Error: Within"[[1]]$"Mean Sq"  # S2^2Mean Square Value for Within Run

> dfRun = summary(kiaraov)$"Error: Run"[[1]]$"Df"

> dfWithin = summary(kiaraov)$"Error: Within"[[1]]$"Df"

> Run = summary(kiaraov)$"Error: Run"[[1]]$"Mean Sq" # S1^2Mean Square for between Run

> between = (Run-within)/((dfWithin/(dfRun+1))+1) # (S1^2-S2^2)/J

> total = between+within

> between # Between Run Variance
[1] 0.2054398

> within # Within Run Variance
[1] 0.2422917

> total # Total Variance
[1] 0.4477315

> betweenCV = sqrt(between)/grandmean * 100 # Between Run CV%

> withinCV = sqrt(within)/grandmean * 100 # Within Run CV%

> totalCV = sqrt(total)/grandmean * 100 # Total CV%

> #within confidence intervals

> withinLCB = within/qf(1-a,8,Inf) # Within LCB

> withinUCB = within/qf(a,8,Inf) # Within UCB

> #Between Confidence Intervals

> n1 = dfRun

> n2 = dfWithin

> G1 = 1-(1/qf(1-a,n1,Inf)) # According to Burdick and Graybill this should be a

> G2 = 1-(1/qf(1-a,n2,Inf))

> H1 = (1/qf(a,n1,Inf))-1  # and this should be 1-a, but my results don't agree

> H2 = (1/qf(a,n2,Inf))-1

> G12 = ((qf(1-a,n1,n2)-1)^2-(G1^2*qf(1-a,n1,n2)^2)-(H2^2))/qf(1-a,n1,n2) # again, should be a, not 1-a

> H12 = ((1-qf(a,n1,n2))^2-H1^2*qf(a,n1,n2)^2-G2^2)/qf(a,n1,n2) # again, should be 1-a, not a

> Vu = H1^2*Run^2+G2^2*within^2+H12*Run*within

> Vl = G1^2*Run^2+H2^2*within^2+G12*within*Run

> betweenLCB = (Run-within-sqrt(Vl))/J # Betwen LCB

> betweenUCB = (Run-within+sqrt(Vu))/J # Between UCB

> #Total Confidence Intervals

> y = (Run+(J-1)*within)/J

> totalLCB = y-(sqrt(G1^2*Run^2+G2^2*(J-1)^2*within^2)/J) # Total LCB

> totalUCB = y+(sqrt(H1^2*Run^2+H2^2*(J-1)^2*within^2)/J) # Total UCB

> result = data.frame(Name=c("within", "between", "total"),CV=c(withinCV,betweenCV,totalCV),LCB=c(sqrt(withinLCB)/grandmean*100,sqrt(betweenLCB)/grandmean*100,sqrt(totalLCB)/grandmean*100),UCB=c(sqrt(withinUCB)/grandmean*100,sqrt(betweenUCB)/grandmean*100,sqrt(totalUCB)/grandmean*100))

> result
     Name       CV      LCB      UCB
1  within 4.926418 3.327584  9.43789
2 between 4.536327      NaN 19.73568
3   total 6.696855 4.846030 20.42647

Here the lower confidence interval for between run CV is less than zero, so reported as NaN.

I'd love to have a better way to do this. If I get time I might try to create a function to do this.

Paul.

Edit: I did eventually write a function, here it is (caveat emptor)

#' avar Function
#' 
#' Calculate thewithin, between and total %CV of a dataset by ANOVA, and the
#' associated confidence intervals
#' 
#' @param dataf - The data frame to use, in long format 
#' @param afactor Character string representing the column in dataf that contains the factor
#' @param aresponse  Charactyer string representing the column in dataf that contains the response value
#' @param aconfidence What Confidence limits to use, default = 95%
#' @param digits  Significant Digits to report to, default = 3
#' @param debug Boolean, Should debug messages be displayed, default=FALSE
#' @returnType dataframe containing the Mean, Within, Between and Total %CV and LCB and UCB for each
#' @return 
#' @author Paul Hurley
#' @export
#' @examples 
#' #Using the BGBottles data from Burdick and Graybill Page 62
#' assayvar(dataf=BGBottles, afactor="Machine", aresponse="weight")
avar<-function(dataf, afactor, aresponse, aconfidence=95, digits=3, debug=FALSE){
    dataf<-subset(dataf,!is.na(with(dataf,get(aresponse))))
    nmissing<-function(x) sum(!is.na(x))
    n<-nrow(subset(dataf,is.numeric(with(dataf,get(aresponse)))))
    datadesc<-ddply(dataf, afactor, colwise(nmissing,aresponse))
    I<-nrow(datadesc)
    if(debug){print(datadesc)}
    if(min(datadesc[,2])==max(datadesc[,2])){
        balance<-TRUE
        J<-min(datadesc[,2])
        if(debug){message(paste("Dataset is balanced, J=",J,"I is ",I,sep=""))}
    } else {
        balance<-FALSE
        Jh<-I/(sum(1/datadesc[,2], na.rm = TRUE))
        J<-Jh
        m<-min(datadesc[,2])
        M<-max(datadesc[,2])
        if(debug){message(paste("Dataset is unbalanced, like me, I is ",I,sep=""))}
        if(debug){message(paste("Jh is ",Jh, ", m is ",m, ", M is ",M, sep=""))}
    }
    if(debug){message(paste("Call afactor=",afactor,", aresponse=",aresponse,sep=""))}
    formulatext<-paste(as.character(aresponse)," ~ 1 + Error(",as.character(afactor),")",sep="")
    if(debug){message(paste("formula text is ",formulatext,sep=""))}
    aovformula<-formula(formulatext)
    if(debug){message(paste("Formula is ",as.character(aovformula),sep=""))}
    assayaov<-aov(formula=aovformula,data=dataf)
    if(debug){
        print(assayaov)
        print(summary(assayaov))
    }
    a<-1-((1-(aconfidence/100))/2)
    if(debug){message(paste("confidence is ",aconfidence,", alpha is ",a,sep=""))}
    grandmean<-as.vector(assayaov$"(Intercept)"[[1]][1]) # Grand Mean (I think)
    if(debug){message(paste("n is",n,sep=""))}

    #This line commented out, seems to choke with an aov object built from an external formula
    #grandmean<-as.vector(model.tables(assayaov,type="means")[[1]]$`Grand mean`) # Grand Mean (I think)
    within<-summary(assayaov)[[2]][[1]]$"Mean Sq"  # d2e, S2^2 Mean Square Value for Within Machine = 0.1819
    dfRun<-summary(assayaov)[[1]][[1]]$"Df"  # DF for within = 3
    dfWithin<-summary(assayaov)[[2]][[1]]$"Df"  # DF for within = 8
    Run<-summary(assayaov)[[1]][[1]]$"Mean Sq" # S1^2Mean Square for Machine
    if(debug){message(paste("mean square for Run ?",Run,sep=""))}
    #Was between<-(Run-within)/((dfWithin/(dfRun+1))+1) but my comment suggests this should be just J, so I'll use J !
    between<-(Run-within)/J # d2a (S1^2-S2^2)/J
    if(debug){message(paste("S1^2 mean square machine is ",Run,", S2^2 mean square within is ",within))}
    total<-between+within
    between # Between Run Variance
    within # Within Run Variance
    total # Total Variance
    if(debug){message(paste("between is ",between,", within is ",within,", Total is ",total,sep=""))}

    betweenCV<-sqrt(between)/grandmean * 100 # Between Run CV%
    withinCV<-sqrt(within)/grandmean * 100 # Within Run CV%
    totalCV<-sqrt(total)/grandmean * 100 # Total CV%
    n1<-dfRun
    n2<-dfWithin
    if(debug){message(paste("n1 is ",n1,", n2 is ",n2,sep=""))}
    #within confidence intervals
    if(balance){
        withinLCB<-within/qf(a,n2,Inf) # Within LCB
        withinUCB<-within/qf(1-a,n2,Inf) # Within UCB
    } else {
        withinLCB<-within/qf(a,n2,Inf) # Within LCB
        withinUCB<-within/qf(1-a,n2,Inf) # Within UCB
    }
#Mean Confidence Intervals
    if(debug){message(paste(grandmean,"+/-(sqrt(",Run,"/",n,")*qt(",a,",df=",I-1,"))",sep=""))} 
    meanLCB<-grandmean+(sqrt(Run/n)*qt(1-a,df=I-1)) # wrong
    meanUCB<-grandmean-(sqrt(Run/n)*qt(1-a,df=I-1)) # wrong
    if(debug){message(paste("Grandmean is ",grandmean,", meanLCB = ",meanLCB,", meanUCB = ",meanUCB,aresponse,sep=""))}
    if(debug){print(summary(assayaov))}
#Between Confidence Intervals
    G1<-1-(1/qf(a,n1,Inf)) 
    G2<-1-(1/qf(a,n2,Inf))
    H1<-(1/qf(1-a,n1,Inf))-1  
    H2<-(1/qf(1-a,n2,Inf))-1
    G12<-((qf(a,n1,n2)-1)^2-(G1^2*qf(a,n1,n2)^2)-(H2^2))/qf(a,n1,n2) 
    H12<-((1-qf(1-a,n1,n2))^2-H1^2*qf(1-a,n1,n2)^2-G2^2)/qf(1-a,n1,n2) 
    if(debug){message(paste("G1 is ",G1,", G2 is ",G2,sep=""))
        message(paste("H1 is ",H1,", H2 is ",H2,sep=""))
        message(paste("G12 is ",G12,", H12 is ",H12,sep=""))
    }
    if(balance){
        Vu<-H1^2*Run^2+G2^2*within^2+H12*Run*within
        Vl<-G1^2*Run^2+H2^2*within^2+G12*within*Run
        betweenLCB<-(Run-within-sqrt(Vl))/J # Betwen LCB
        betweenUCB<-(Run-within+sqrt(Vu))/J # Between UCB
    } else {
        #Burdick and Graybill seem to suggest calculating anova of mean values to find n1S12u/Jh
        meandataf<-ddply(.data=dataf,.variable=afactor, .fun=function(df){mean(with(df, get(aresponse)), na.rm=TRUE)})
        meandataaov<-aov(formula(paste("V1~",afactor,sep="")), data=meandataf)
        sumsquare<-summary(meandataaov)[[1]]$`Sum Sq`
        #so maybe S12u is just that bit ?
        Runu<-(sumsquare*Jh)/n1
        if(debug){message(paste("n1S12u/Jh is ",sumsquare,", so S12u is ",Runu,sep=""))}
        Vu<-H1^2*Runu^2+G2^2*within^2+H12*Runu*within
        Vl<-G1^2*Runu^2+H2^2*within^2+G12*within*Runu
        betweenLCB<-(Runu-within-sqrt(Vl))/Jh # Betwen LCB
        betweenUCB<-(Runu-within+sqrt(Vu))/Jh # Between UCB
        if(debug){message(paste("betweenLCB is ",betweenLCB,", between UCB is ",betweenUCB,sep=""))}
    }
#Total Confidence Intervals
    if(balance){
        y<-(Run+(J-1)*within)/J
        if(debug){message(paste("y is ",y,sep=""))}
        totalLCB<-y-(sqrt(G1^2*Run^2+G2^2*(J-1)^2*within^2)/J) # Total LCB
        totalUCB<-y+(sqrt(H1^2*Run^2+H2^2*(J-1)^2*within^2)/J) # Total UCB
    } else {
        y<-(Runu+(Jh-1)*within)/Jh
        if(debug){message(paste("y is ",y,sep=""))}
        totalLCB<-y-(sqrt(G1^2*Runu^2+G2^2*(Jh-1)^2*within^2)/Jh) # Total LCB
        totalUCB<-y+(sqrt(H1^2*Runu^2+H2^2*(Jh-1)^2*within^2)/Jh) # Total UCB
    }
    if(debug){message(paste("totalLCB is ",totalLCB,", total UCB is ",totalUCB,sep=""))}
#   result<-data.frame(Name=c("within", "between", "total"),CV=c(withinCV,betweenCV,totalCV),
#           LCB=c(sqrt(withinLCB)/grandmean*100,sqrt(betweenLCB)/grandmean*100,sqrt(totalLCB)/grandmean*100),
#           UCB=c(sqrt(withinUCB)/grandmean*100,sqrt(betweenUCB)/grandmean*100,sqrt(totalUCB)/grandmean*100))
    result<-data.frame(Mean=grandmean,MeanLCB=meanLCB, MeanUCB=meanUCB, Within=withinCV,WithinLCB=sqrt(withinLCB)/grandmean*100, WithinUCB=sqrt(withinUCB)/grandmean*100,
            Between=betweenCV, BetweenLCB=sqrt(betweenLCB)/grandmean*100, BetweenUCB=sqrt(betweenUCB)/grandmean*100,
            Total=totalCV, TotalLCB=sqrt(totalLCB)/grandmean*100, TotalUCB=sqrt(totalUCB)/grandmean*100)
    if(!digits=="NA"){
        result$Mean<-signif(result$Mean,digits=digits)
        result$MeanLCB<-signif(result$MeanLCB,digits=digits)
        result$MeanUCB<-signif(result$MeanUCB,digits=digits)
        result$Within<-signif(result$Within,digits=digits)
        result$WithinLCB<-signif(result$WithinLCB,digits=digits)
        result$WithinUCB<-signif(result$WithinUCB,digits=digits)
        result$Between<-signif(result$Between,digits=digits)
        result$BetweenLCB<-signif(result$BetweenLCB,digits=digits)
        result$BetweenUCB<-signif(result$BetweenUCB,digits=digits)
        result$Total<-signif(result$Total,digits=digits)
        result$TotalLCB<-signif(result$TotalLCB,digits=digits)
        result$TotalUCB<-signif(result$TotalUCB,digits=digits)
    }
    return(result)
}

assayvar<-function(adata, aresponse, afactor, anominal, aconfidence=95, digits=3, debug=FALSE){
    result<-ddply(adata,anominal,function(df){
                resul<-avar(dataf=df,afactor=afactor,aresponse=aresponse,aconfidence=aconfidence, digits=digits, debug=debug)
                resul$n<-nrow(subset(df, !is.na(with(df, get(aresponse)))))
                return(resul)
            })
    return(result)
}

answered Oct 05 '22 06:10

PaulHurleyuk

Related questions
                            
                                Export all user inputs in a Shiny app to file and load them later
                            
                                matrix by matrix in R markdown
                            
                                R: How can I export methods provided by a package to a PSOCK cluster?
                            
                                Using ggplot to plot occupied parking spaces in a parking lot
                            
                                Progressive concatenation of a column by a group [duplicate]
                            
                                Create list of vectors using rep()
                            
                                Finding the max of a R dataframe column ignoring -Inf and NA
                            
                                Displaying separate means within fill groups in ggplot boxplot
                            
                                Combine plots that have a legend with one that doesn't
                            
                                How to render flexdashboard from the command line?
                            
                                Line height spacing for text in ggplot
                            
                                R function to convert object into code
                            
                                Simplify this grid such that each row and column has 1 value
                            
                                Make Shiny icon smaller in Valuebox
                            
                                DT::datatable – Format selected column?
                            
                                RMarkdown - Change Inline Code Color
                            
                                Count consecutive elements in a same length vector
                            
                                How to cope with a singular fit in a linear mixed model (lme4)?
                            
                                I am trying to install openssl package in R using Ubuntu 18.04 without success
                            
                                Can you use dplyr across() to iterate across pairs of columns?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Calculate within and between variances and confidence intervals in R

Tags:

r

statistics

PaulHurleyuk

People also ask

4 Answers

Alex Reynolds

Jonathan Chang

Matt Parker

PaulHurleyuk

Recent Activity

Donate For Us