I have a data set that begins like this:
> d.weight
R N P C D.weight
1 1 0 0 GO 45.3
2 2 0 0 GO 34.0
3 3 0 0 GO 19.1
4 4 0 0 GO 26.6
5 5 0 0 GO 23.5
6 1 45 0 GO 22.1
7 2 45 0 GO 15.5
8 3 45 0 GO 23.4
9 4 45 0 GO 15.8
10 5 45 0 GO 42.9
...
and so on.
R
is replicate and there are 5 of them (1-5).N
is nitrogen level, and there are 5 as well (0, 45, 90, 180, 360).P
is phosphorus level, and there are 5 as well (0, 35, 70, 140, 280).C
is plant combination, and there are 4 (GO, GB, LO, LB).D.weight
is dry weight in grams.However, when I do an ANOVA I get the wrong degrees of freedom. I usually run my ANOVAs on subsets of that full set of data, but let's just do an analysis I wouldn't actually do otherwise, just so you can see that almost all of the Df (degrees of freedom) are wrong.
> example.aov=aov(D.weight ~ R+N+P+C, data=d.weight)
> summary(example.aov)
Df Sum Sq Mean Sq F value Pr(>F)
R 1 1158 1158 9.484 0.00226 **
N 1 202 202 1.657 0.19900
P 1 11040 11040 90.408 < 2e-16 ***
C 3 41032 13677 112.010 < 2e-16 ***
Residuals 313 38220 122
So, basically, the only one that's right is the C factor. Is it because it has letters instead of numbers?
I found somewhere that if I write interaction()
with each term, I get the right Df, but I don't know if that's the right thing to do overall. For example:
> example.aov2=aov(D.weight ~ interaction(R)+interaction(N)+interaction(P)+interaction(C), data=d.weight)
> summary(example.aov2)
Df Sum Sq Mean Sq F value Pr(>F)
interaction(R) 4 7423 1856 19.544 2.51e-14 ***
interaction(N) 4 543 136 1.429 0.224
interaction(P) 4 13788 3447 36.301 < 2e-16 ***
interaction(C) 3 41032 13677 144.042 < 2e-16 ***
Residuals 304 28866 95
I tried it with the C
factor only to see if it messed up anything:
> example.aov3=aov(D.weight ~ C, data=d.weight)
> summary(example.aov3)
Df Sum Sq Mean Sq F value Pr(>F)
C 3 41032 13677 85.38 <2e-16 ***
Residuals 316 50620 160
>
> example.aov4=aov(D.weight ~ interaction(C), data=d.weight)
> summary(example.aov4)
Df Sum Sq Mean Sq F value Pr(>F)
interaction(C) 3 41032 13677 85.38 <2e-16 ***
Residuals 316 50620 160
And it looks the same. Should I be adding interaction()
everywhere?
R determines whether it should treat variables as categorical (ANOVA-type analysis) or continuous (regression-type analysis) by checking whether they are numeric
or factor
variables. Most simply, you can convert your predictor (independent) variables to factors via
facs <- c("R","N","P")
d.weight[facs] <- lapply(d.weight[facs],factor)
If you want to create auxiliary variables instead of overwriting you could do something like
for (varname in facs) {
d.weight[[paste0("f",varname)]] <- factor(d.weight[[varname]])
}
There might be a more compact way to do it but that should serve ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With