I've been taught to run an ANOVA with the formula: aov(dependent variable~independent variable, dataset)
but I am struggling with how to run an ANOVA for a particular dataset because it is broken up into three columns that each contain a value. The three columns are designated newborn, adolescent and adult (which is hamster age) and the values within each column represent blood pressure values. I need to run a test to determine if there is a relationship between blood pressure and age.
This is what the data looks like in R:
> hamster
Newborn adolescent adult
1 108 110 105
2 110 105 100
3 90 100 95
4 80 90 85
5 100 102 97
6 120 110 105
7 125 105 100
8 130 115 110
9 120 100 95
10 130 120 115
11 145 130 125
12 150 125 120
13 130 135 130
14 155 130 125
15 140 120 115
Confused because the dependent variable are those values ^ within each column
In short: aov fits a model (as you are already aware, internally it calls lm ), so it produces regression coefficients, fitted values, residuals, etc; It produces an object of primary class "aov" but also a secondary class "lm". So, it is an augmentation of an "lm" object. anova is a generic function.
A dataset can be written in two different formats: wide and long. A wide format contains values that do not repeat in the first column. A long format contains values that do repeat in the first column. Notice that in the wide dataset, each value in the first column is unique.
The first step is to rearrange your data so it's in a "long" format instead of a "wide" format. This can be done in base R using the reshape
function, but it's much easier to use the gather
function in the tidyr
package:
library(tidyr)
result <- hampster %>%
gather(age, bp) %>%
aov(bp ~ age, .)
Using tidyr
also gives us the pipe operator (%>%
), which let's you chain commands together in a pretty way. By default, it works by taking the result of the previous function and inserting it as the first argument of the next function. In your aov
function, we overrode this using the .
operator to explicitly put the data set resulting from the gather
function in as the 2nd argument.
R has a useful function called stack
to convert your data format into the one needed for ANOVA.
aov(values ~ ind, stack(hamster))
# Call:
#
# aov(formula = values ~ ind, data = stack(hamster))
#
# Terms:
# ind Residuals
# Sum of Squares 1525.378 11429.867
# Deg. of Freedom 2 42
#
# Residual standard error: 16.49666
# Estimated effects may be unbalanced
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With