I have made this minimal reproducible example to exemplify my question. I have already managed to solve the problem, but I am sure there are more elegant ways of coding it.
The issue is about binary classification based on multiple criteria. In order to recive a donut (coded 1) requires a score of at least 3 (or more) for: at least one of "a" criterion items, at least two of "b" criterion items and at least three of "c" criterion items. If these requirements aren't met no donut will be awarded (coded 0).
This is my solution. How would you code it more concisely/elegantly?
require(dplyr)
df <- data.frame("a1" = c(3,2,2,5),
"a2" = c(2,1,3,1),
"b1" = c(2,1,5,4),
"b2" = c(1,2,1,4),
"b3" = c(3,2,3,4),
"c1" = c(3,3,1,3),
"c2" = c(4,2,3,4),
"c3" = c(3,3,4,1),
"c4" = c(1,2,3,4),
stringsAsFactors = FALSE)
df_names <- names(df[, 1:9])
a_items <- names(df[, 1:2])
b_items <- names(df[, 3:5])
c_items <- names(df[, 6:9])
df_response <- df %>%
select(df_names) %>%
mutate_all(
funs(case_when(
. >=3 ~ 1,
is.na(.) ~ 0,
TRUE ~ 0))) %>%
mutate(a_crit = case_when( rowSums(.[ ,a_items]) >=1 ~ 1, # one a item needed
TRUE ~ 0)) %>%
mutate(b_crit = case_when( rowSums(.[ ,b_items]) >=2 ~ 1, # two b items needed
TRUE ~ 0)) %>%
mutate(c_crit = case_when( rowSums(.[ ,c_items]) >=3 ~ 1, # three c items needed
TRUE ~ 0)) %>%
mutate(overal_crit = case_when( a_crit == 1 & b_crit == 1 & c_crit == 1 ~ 1,
TRUE ~ 0))
df_response$overal_crit
I'll go with simple mutate
call
library(dplyr)
df %>%
mutate(a_crit = as.integer(rowSums(.[a_items] >= 3) >= 1),
b_crit = as.integer(rowSums(.[b_items] >= 3) >= 2),
c_crit = as.integer(rowSums(.[c_items] >= 3) >= 3),
overal_crit = as.integer((a_crit + b_crit + c_crit) == 3))
# a1 a2 b1 b2 b3 c1 c2 c3 c4 a_crit b_crit c_crit overal_crit
#1 3 2 2 1 3 3 4 3 1 1 0 1 0
#2 2 1 1 2 2 3 2 3 2 0 0 0 0
#3 2 3 5 1 3 1 3 4 3 1 1 1 1
#4 5 1 4 4 4 3 4 1 4 1 1 1 1
If we have several list of names, then it would be easier to do this with map
library(tidyverse)
map2_dfc(list(a_items, b_items, c_items), 1:3, ~
df[.x] %>%
{+(rowSums(. >= 3) >= .y)}) %>%
rename_all(~ paste0(letters[1:3], "_crit")) %>%
mutate(overal_crit = +(rowSums(.)==3) ) %>%
bind_cols(df, .)
# a1 a2 b1 b2 b3 c1 c2 c3 c4 a_crit b_crit c_crit overal_crit
#1 3 2 2 1 3 3 4 3 1 1 0 1 0
#2 2 1 1 2 2 3 2 3 2 0 0 0 0
#3 2 3 5 1 3 1 3 4 3 1 1 1 1
#4 5 1 4 4 4 3 4 1 4 1 1 1 1
NOTE: The number of code lines will be not change here with the number of different vectors for comparison
Or using base R
methods with Map
lst1 <- Map(function(x, y) rowSums(df[x] >= 3) >= y,
list(a_items, b_items, c_items), 1:3)
df[paste0(c(letters[1:3], "overall"), "_crit")] <- c(lst1, list(Reduce(`&`, lst1)))
df %>%
mutate(over_all=if_else(rowSums(.[grepl('a',names(.))]>=3)>=1 &
rowSums(.[grepl('b',names(.))]>=3)>=2 &
rowSums(.[grepl('c',names(.))]>=3)>=3, 1, 0))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With