Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr: case_when() over multiple columns with multiple conditions

Tags:

r

dplyr

I have made this minimal reproducible example to exemplify my question. I have already managed to solve the problem, but I am sure there are more elegant ways of coding it.

The issue is about binary classification based on multiple criteria. In order to recive a donut (coded 1) requires a score of at least 3 (or more) for: at least one of "a" criterion items, at least two of "b" criterion items and at least three of "c" criterion items. If these requirements aren't met no donut will be awarded (coded 0).

This is my solution. How would you code it more concisely/elegantly?

require(dplyr)
df <- data.frame("a1" = c(3,2,2,5), 
                 "a2" = c(2,1,3,1),
                 "b1" = c(2,1,5,4),
                 "b2" = c(1,2,1,4),
                 "b3" = c(3,2,3,4),
                 "c1" = c(3,3,1,3),
                 "c2" = c(4,2,3,4),
                 "c3" = c(3,3,4,1),
                 "c4" = c(1,2,3,4),
stringsAsFactors = FALSE)

df_names <- names(df[, 1:9])
a_items <- names(df[, 1:2])
b_items <- names(df[, 3:5])
c_items <- names(df[, 6:9])

df_response <-  df %>% 
  select(df_names) %>% 
  mutate_all(
    funs(case_when(
      . >=3 ~ 1,
      is.na(.) ~ 0,
      TRUE  ~  0))) %>% 

  mutate(a_crit = case_when( rowSums(.[ ,a_items]) >=1 ~ 1,    # one a item needed
                             TRUE  ~  0)) %>% 
  mutate(b_crit = case_when( rowSums(.[ ,b_items]) >=2 ~ 1,    # two b items needed
                             TRUE  ~  0)) %>% 
  mutate(c_crit = case_when( rowSums(.[ ,c_items]) >=3 ~ 1,    # three c items needed
                             TRUE  ~  0)) %>% 
  mutate(overal_crit = case_when( a_crit == 1 & b_crit == 1 & c_crit == 1 ~ 1,
                                  TRUE  ~  0)) 

df_response$overal_crit
like image 703
Claudiu Papasteri Avatar asked Feb 06 '19 16:02

Claudiu Papasteri


3 Answers

I'll go with simple mutate call

library(dplyr)

df %>%
  mutate(a_crit = as.integer(rowSums(.[a_items] >= 3) >= 1), 
         b_crit = as.integer(rowSums(.[b_items] >= 3) >= 2), 
         c_crit = as.integer(rowSums(.[c_items] >= 3) >= 3), 
         overal_crit = as.integer((a_crit + b_crit + c_crit) == 3))

#  a1 a2 b1 b2 b3 c1 c2 c3 c4 a_crit b_crit c_crit overal_crit
#1  3  2  2  1  3  3  4  3  1      1      0      1           0
#2  2  1  1  2  2  3  2  3  2      0      0      0           0
#3  2  3  5  1  3  1  3  4  3      1      1      1           1
#4  5  1  4  4  4  3  4  1  4      1      1      1           1
like image 102
Ronak Shah Avatar answered Nov 16 '22 04:11

Ronak Shah


If we have several list of names, then it would be easier to do this with map

library(tidyverse)
map2_dfc(list(a_items, b_items, c_items), 1:3, ~  
          df[.x]  %>% 
             {+(rowSums(. >= 3) >= .y)}) %>%
      rename_all(~ paste0(letters[1:3], "_crit")) %>% 
      mutate(overal_crit = +(rowSums(.)==3) ) %>%
      bind_cols(df, .)
#  a1 a2 b1 b2 b3 c1 c2 c3 c4 a_crit b_crit c_crit overal_crit
#1  3  2  2  1  3  3  4  3  1      1      0      1           0
#2  2  1  1  2  2  3  2  3  2      0      0      0           0
#3  2  3  5  1  3  1  3  4  3      1      1      1           1
#4  5  1  4  4  4  3  4  1  4      1      1      1           1

NOTE: The number of code lines will be not change here with the number of different vectors for comparison


Or using base R methods with Map

lst1 <- Map(function(x, y) rowSums(df[x] >= 3) >= y, 
        list(a_items, b_items, c_items), 1:3)
df[paste0(c(letters[1:3], "overall"),  "_crit")] <- c(lst1, list(Reduce(`&`, lst1))) 
like image 2
akrun Avatar answered Nov 16 '22 04:11

akrun


df %>% 
   mutate(over_all=if_else(rowSums(.[grepl('a',names(.))]>=3)>=1 & 
                           rowSums(.[grepl('b',names(.))]>=3)>=2 & 
                           rowSums(.[grepl('c',names(.))]>=3)>=3, 1, 0))
like image 1
A. Suliman Avatar answered Nov 16 '22 06:11

A. Suliman