Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculation with multiple column conditional on data being there

Tags:

r

I am trying to solve is how to calculate the weighted score for each class each month.

Each class has multiple students and the weight (contribution) of a student's score varies through time.

To be included in the calculation a student must have both score and weight.

I am a bit lost and none of the approaches I have used have worked.

 Student Class Jan_18_score Feb_18_score Jan_18_Weight Feb_18_Weight
    Adam   1       3             2           150           153 
    Char   1       5             7           30            60
    Fred   1       -7            8           NA            80
    Greg   1       2             NA          80            40
    Ed     2       1             2           60            80
    Mick   2       NA            6           80            30
    Dave   3       5             NA          40            25
    Nick   3       8             8           12            45  
    Tim    3       -2            7           23            40 
    George 3       5             3           65            NA
    Tom    3       NA            8           78            50

The overall goal is to calculate the weighted score for each class each month.

Taking Class 1 (first 4 rows) as an example and looking at Jan_18.

-The observations of Adam, Char and Greg are valid since they have both scores and weights. Their scores and weights should be included - Fred does not have a Jan_18_weight, therefore both his Jan_18_score and Jan_18_weight are excluded from the calculation.

The following calculation should then occur:

= [(3*150)+(5*30)+(2*80)]/ [150+30+80] = 2.92307

This calculation would be repeated for each class and each month.

A new dataframe something like the following should be the output

  Class Jan_18_Weight_Score Feb_18_Weight_Score 
    1            2.92307           etc
    2             etc              etc
    3             etc              etc

There are many columns and many rows.

Any help is appreciated.

like image 452
CISCO Avatar asked Mar 15 '26 16:03

CISCO


1 Answers

Here's a way with tidyverse. The main trick is to replace NA with 0 in "weights" columns and then use weighted.mean() with na.rm = T to ignore NA scores. To do so, you can gather the scores and weights into a single column and then group by Class and month_abb (a calculated field for grouping) and then use weighted.mean().

df %>%
  mutate_at(vars(ends_with("Weight")), ~replace_na(., 0)) %>% 
  gather(month, value, -Student, -Class) %>% 
  group_by(Class, month_abb = paste0(substr(month, 1, 3), "_Weight_Score")) %>%
  summarize(
    weight_score = weighted.mean(value[grepl("score", month)], value[grepl("Weight", month)], na.rm = T)
  ) %>% 
  ungroup() %>%
  spread(month_abb, weight_score)

# A tibble: 3 x 3
  Class Feb_Weight_Score Jan_Weight_Score
  <int>            <dbl>            <dbl>
1     1             4.66             2.92
2     2             3.09             1   
3     3             7.70             4.11

Data -

df <- structure(list(Student = c("Adam", "Char", "Fred", "Greg", "Ed", 
"Mick", "Dave", "Nick", "Tim", "George", "Tom"), Class = c(1L, 
1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), Jan_18_score = c(3L, 
5L, -7L, 2L, 1L, NA, 5L, 8L, -2L, 5L, NA), Feb_18_score = c(2L, 
7L, 8L, NA, 2L, 6L, NA, 8L, 7L, 3L, 8L), Jan_18_Weight = c(150L, 
30L, NA, 80L, 60L, 80L, 40L, 12L, 23L, 65L, 78L), Feb_18_Weight = c(153L, 
60L, 80L, 40L, 80L, 30L, 25L, 45L, 40L, NA, 50L)), class = "data.frame", row.names = c(NA, 
-11L))
like image 148
Shree Avatar answered Mar 18 '26 06:03

Shree