How do I define the removal of NAs prior to calculating the standard deviation of a variable per group? I'm using ave() like so:
df$sd_grade1 <- ave(df$grade1, df$class, FUN = sd)
"Grade 1" is a numeric variable from 1 to 7 and it consists one NA value. I want to calculate the standard deviation of the variable "Grade 1" grouped per class (class has two groups, "math" and "english"). The problem is, if there is a single NA value in "Grade 1" and it's in math in df$class, all values for standard deviation for math become NAs. Ungrouped it's not a problem, NAs are removed correctly before SD is calculated like so:
df$sd_grade1 <- sd(df$grade1, na.rm = TRUE)
Basically, I want to omit any NAs that appear in "Grade 1" when calculating the standard deviation per group in a new variable. With ave() and na.rm = TRUE I get an error, na.omit = TRUE returns no error but doesn't do anything either. How do I correctly define it with ave()?
Reproducible:
df <- data.frame(
grade1 = sample(1:10),
class = sample(c("maths", "english"), 10, replace = TRUE)
)
df$grade1<-car::recode(df$grade1,"3=NA")
# ungrouped, same SD regardless of group and NAs omitted in SD calculation, but this is not something I want.
df$sd_grade1 <- sd(df$grade1, na.rm = TRUE)
# grouped, but na.rm does not work here because "maths" contains one NA.
df$sd_grp_grade1 <- ave(df$grade1, df$class, FUN = sd, na.rm = TRUE)
By using dplyr
library(dplyr)
df %>% group_by(class) %>% summarise(SD = sd(grade1,
na.rm = TRUE))
Output:
# A tibble: 2 x 2
class SD
<fct> <dbl>
1 english 2.63
2 maths 3.65
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With