Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing NAs before calculating a grouped statistic

Tags:

r

How do I define the removal of NAs prior to calculating the standard deviation of a variable per group? I'm using ave() like so:

df$sd_grade1 <- ave(df$grade1, df$class, FUN = sd)

"Grade 1" is a numeric variable from 1 to 7 and it consists one NA value. I want to calculate the standard deviation of the variable "Grade 1" grouped per class (class has two groups, "math" and "english"). The problem is, if there is a single NA value in "Grade 1" and it's in math in df$class, all values for standard deviation for math become NAs. Ungrouped it's not a problem, NAs are removed correctly before SD is calculated like so:

df$sd_grade1 <- sd(df$grade1, na.rm = TRUE)

Basically, I want to omit any NAs that appear in "Grade 1" when calculating the standard deviation per group in a new variable. With ave() and na.rm = TRUE I get an error, na.omit = TRUE returns no error but doesn't do anything either. How do I correctly define it with ave()?

Reproducible:

df <- data.frame(
  grade1 = sample(1:10),
  class = sample(c("maths", "english"), 10, replace = TRUE)
)

df$grade1<-car::recode(df$grade1,"3=NA")

# ungrouped, same SD regardless of group and NAs omitted in SD calculation, but this is not something I want.
df$sd_grade1 <- sd(df$grade1, na.rm = TRUE)

# grouped, but na.rm does not work here because "maths" contains one NA.
df$sd_grp_grade1 <- ave(df$grade1, df$class, FUN =  sd, na.rm = TRUE)
like image 700
sysimmie Avatar asked Jan 23 '26 06:01

sysimmie


1 Answers

By using dplyr

library(dplyr)
df %>% group_by(class) %>% summarise(SD = sd(grade1, 
        na.rm = TRUE))

Output:

  # A tibble: 2 x 2
  class      SD
  <fct>   <dbl>
1 english  2.63
2 maths    3.65
like image 97
BetterCallMe Avatar answered Jan 25 '26 00:01

BetterCallMe