Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Repeated-Measures ANOVA: ezANOVA vs. aov vs. lme syntax

This question is both about syntax and semantics, thus please find a (yet unanswered) duplicate on Cross-Validated: https://stats.stackexchange.com/questions/113324/repeated-measures-anova-ezanova-vs-aov-vs-lme-syntax

In the machine-learning domain, I evaluated 4 classifiers on the same 5 datasets, i.e. each classifier returned a performance measure for dataset 1, 2, 3, ... and 5. Now I want to know whether the classifiers differ significantly in their performance. Here's some toy data:

Performance<-c(2,3,3,2,3,1,2,2,1,1,3,1,3,2,3,2,1,2,1,2)
Dataset<-factor(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5))
Classifier<-factor(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4))
data<-data.frame(Classifier,Dataset,Performance)

Following a textbook, I conducted a repeated-measures one-way ANOVA. I interpreted my performance as dependent variable, the classifiers as subjects and the datasets as within-subjects factor. Using aov, I got:

model <- aov(Performance ~ Classifier + Error(factor(Dataset)), data=data)
summary(model)

Yielding the following output:

Error: factor(Dataset)
           Df Sum Sq Mean Sq F value Pr(>F)
Residuals  4    2.5   0.625               

Error: Within
            Df Sum Sq Mean Sq F value Pr(>F)  
Classifier  3    5.2  1.7333   4.837 0.0197 *
Residuals  12    4.3  0.3583 

I get similar results when using a linear mixed-effects model:

model <- lme(Performance ~ Classifier, random = ~1|Dataset/Classifier,data=data)
result<-anova(model)

I then tried to reproduce the results with ezANOVA in order to perform Mauchlys test for Sphericity:

 ezANOVA(data=data, dv=.(Performance), wid=.(Classifier), within=.(Dataset), detailed=TRUE, type=3)

Yielding the following output:

        Effect DFn DFd  SSn SSd         F          p p<.05       ges
 1 (Intercept)   1   3 80.0 5.2 46.153846 0.00652049     * 0.8938547
 2     Dataset   4  12  2.5 4.3  1.744186 0.20497686       0.2083333

This clearly doesn't correspond to the prior output with aov/lme. Nevertheless, when I exchange "Performance" with "Classifier" in the ezANOVA definition, I get the expected results.

I now wonder whether my textbook is wrong (aov definition) or if I misunderstood the ezANOVA syntax. Furthermore, why do I only get Mauchly's test results when rewriting the ezANOVA statement, but not in the first case?

like image 786
Chris Avatar asked Mar 19 '23 15:03

Chris


1 Answers

Since you want to compare classifiers and not datasets, the within factor is classifier and the within ID is dataset. So the correct syntax for your ezANOVA example would be:

ezANOVA(data=data, dv=.(Performance), within=.(Classifier), wid=.(Dataset), detailed=TRUE)

Btw, there is no need to specifiy the type of sums of squares. Since you have only one factor all types of sums of squares will produce the same results anyway.

like image 74
Erich Studerus Avatar answered Apr 01 '23 09:04

Erich Studerus