Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run ANOVA on a wide format data.frame?

I've been taught to run an ANOVA with the formula: aov(dependent variable~independent variable, dataset)

but I am struggling with how to run an ANOVA for a particular dataset because it is broken up into three columns that each contain a value. The three columns are designated newborn, adolescent and adult (which is hamster age) and the values within each column represent blood pressure values. I need to run a test to determine if there is a relationship between blood pressure and age.

This is what the data looks like in R:

> hamster
   Newborn adolescent adult
1      108        110   105
2      110        105   100
3       90        100    95
4       80         90    85
5      100        102    97
6      120        110   105
7      125        105   100
8      130        115   110
9      120        100    95
10     130        120   115
11     145        130   125
12     150        125   120
13     130        135   130
14     155        130   125
15     140        120   115

Confused because the dependent variable are those values ^ within each column

like image 249
Victoria Fletcher Avatar asked Apr 29 '18 23:04

Victoria Fletcher


People also ask

What is the difference between AOV and ANOVA in R?

In short: aov fits a model (as you are already aware, internally it calls lm ), so it produces regression coefficients, fitted values, residuals, etc; It produces an object of primary class "aov" but also a secondary class "lm". So, it is an augmentation of an "lm" object. anova is a generic function.

What is the difference between long format data and wide format data?

A dataset can be written in two different formats: wide and long. A wide format contains values that do not repeat in the first column. A long format contains values that do repeat in the first column. Notice that in the wide dataset, each value in the first column is unique.


2 Answers

The first step is to rearrange your data so it's in a "long" format instead of a "wide" format. This can be done in base R using the reshape function, but it's much easier to use the gather function in the tidyr package:

library(tidyr)
result <- hampster %>%
  gather(age, bp) %>%
  aov(bp ~ age, .)

Using tidyr also gives us the pipe operator (%>%), which let's you chain commands together in a pretty way. By default, it works by taking the result of the previous function and inserting it as the first argument of the next function. In your aov function, we overrode this using the . operator to explicitly put the data set resulting from the gather function in as the 2nd argument.

like image 113
Melissa Key Avatar answered Sep 22 '22 06:09

Melissa Key


R has a useful function called stack to convert your data format into the one needed for ANOVA.

aov(values ~ ind, stack(hamster))

# Call:
#
# aov(formula = values ~ ind, data = stack(hamster))
#
# Terms:
#                       ind Residuals
# Sum of Squares   1525.378 11429.867
# Deg. of Freedom         2        42
#
# Residual standard error: 16.49666
# Estimated effects may be unbalanced
like image 28
Karolis Koncevičius Avatar answered Sep 22 '22 06:09

Karolis Koncevičius