Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rule tables to avoiding nested ifelse statements

Tags:

r

Idea is to have manageable approach to define rules from some table:

library(data.table)

a <- data.table(rule = c("rule1", "rule2", "rule3"),
                bool = c(T,T,F))

a
#     rule  bool
# 1: rule1  TRUE
# 2: rule2  TRUE
# 3: rule3 FALSE

ifelse(a[rule == "rule1", bool] & a[rule == "rule2", bool] & a[rule == "rule3", bool], 1,
       ifelse(a[rule == "rule1", bool] & a[rule == "rule2", bool], 2,
              ifelse(a[rule == "rule2", bool] & a[rule == "rule3", bool], 3, 4)))
# [1] 2

Obviously this is not very sustainable or readable as I keep adding rules. What would an alternative to ifelse here?

like image 635
statespace Avatar asked Sep 18 '25 13:09

statespace


1 Answers

This is a very interesting problem, in particular as the conditions do not always involve all rows of a, i.e., rule1, rule2, and rule3.

I have tried to find a general solution which can be expanded for an arbitrary number of conditions as well as for additional rows in a.

The main idea is to replace the conditions in the nested ifelse() or case_when() statements, resp., by a data.table which then can be joined with a somehow:

library(data.table)
b <- fread(
"rule1, rule2, rule3, result
TRUE,  TRUE,  TRUE,  1
TRUE,  TRUE,  NA,    2
NA,    TRUE,  TRUE,  3
NA,    NA,    NA,    4"
)

E.g., the condition in row 2 specifies to return 2 if rule1 and rule2 both are TRUE while the value of rule3 does not matter and can be ignored as a wildcard.

It is important to note that the order of conditions matters: First, the conditions without any wildcard must be checked. Then, the conditions with one wildcard, and so on. Finally, if no other match is found the default value is applied (all wildcards). The default value must always be given in the last row.
So, the most specialised conditions come first and the most general last.

The OP already has given the test data a in long format:

    rule  bool
1: rule1  TRUE
2: rule2  TRUE
3: rule3 FALSE

Therefore, also the conditions b are reshaped to long format:

lb <- melt(b[, id := .I], c("id", "result"), variable.name = "rule", value.name = "bool", na.rm = TRUE)[
  , nr := .N, by = id][]

lb
   id result  rule bool nr
1:  1      1 rule1 TRUE  3
2:  2      2 rule1 TRUE  2
3:  1      1 rule2 TRUE  3
4:  2      2 rule2 TRUE  2
5:  3      3 rule2 TRUE  2
6:  1      1 rule3 TRUE  3
7:  3      3 rule3 TRUE  2

Before reshaping, a row id has been added which indicates the order of conditions. Wildcards are omitted from the long format as they are not needed for the joins. After reshaping, the number of remaining rows nr per id is appended, i.e., the number of non-wildcard entries.

Now, the conditions are tested:

answer <- lb[a, on = .(rule, bool), nomatch = 0L][
  , result[nr == .N], by = .(nr, id)][
    order(-nr, id), first(V1)]
if (length(answer) == 0L) answer <- b[id == max(id), result] # default
answer

This happens in four steps:

  1. a and lb are joined (inner join) on rule and bool,
  2. as the data to join are in long format, incomplete conditions are removed by checking the number of conditions for each id (nr is included in the by = clause just for convenience as it is required in the next step),
  3. The remaining rows are ordered to pick the first result from the most specialised condition,
  4. If above operation does not return an answer, the default value is returned.

For the given a, above code returns

answer
[1] 2

More test cases

To verify above code is working correctly, a more thorough test is required

test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), sorted = FALSE)
test
   rule1 rule2 rule3
1:  TRUE  TRUE  TRUE
2:  TRUE  TRUE FALSE
3:  TRUE FALSE  TRUE
4:  TRUE FALSE FALSE
5: FALSE  TRUE  TRUE
6: FALSE  TRUE FALSE
7: FALSE FALSE  TRUE
8: FALSE FALSE FALSE

Each row represents a version of a which is turned into OP's long format by

a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool")

By looping over i, all possible combinations of TRUE/FALSE values can be tested. In addition some intermediate results are printed which help to understand the workings:

library(magrittr) # piping used here to improve readability
test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), sorted = FALSE)
for (i in seq(nrow(test))) {
  cat("test case", i, "\n")
  a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool") %T>% 
    print()
  lb[a, on = .(rule, bool), nomatch = 0L][, result[nr == .N], keyby = .(nr, id)] %>% 
    unique() %>%
    print() # intermediate result printed for illustration
  answer <- lb[a, on = .(rule, bool), nomatch = 0L][
    , result[nr == .N], by = .(nr, id)][
      order(-nr, id), first(V1)]
  if (length(answer) == 0L) answer <- b[id == max(id), result] # default from b
  cat("answer = ", answer, "\n\n")
}
test case 1 
    rule bool
1: rule1 TRUE
2: rule2 TRUE
3: rule3 TRUE
   nr id V1
1:  2  2  2
2:  2  3  3
3:  3  1  1
answer =  1 

test case 2 
    rule  bool
1: rule1  TRUE
2: rule2  TRUE
3: rule3 FALSE
   nr id V1
1:  2  2  2
answer =  2 

test case 3 
    rule  bool
1: rule1  TRUE
2: rule2 FALSE
3: rule3  TRUE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer =  4 

test case 4 
    rule  bool
1: rule1  TRUE
2: rule2 FALSE
3: rule3 FALSE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer =  4 

test case 5 
    rule  bool
1: rule1 FALSE
2: rule2  TRUE
3: rule3  TRUE
   nr id V1
1:  2  3  3
answer =  3 

test case 6 
    rule  bool
1: rule1 FALSE
2: rule2  TRUE
3: rule3 FALSE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer =  4 

test case 7 
    rule  bool
1: rule1 FALSE
2: rule2 FALSE
3: rule3  TRUE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer =  4 

test case 8 
    rule  bool
1: rule1 FALSE
2: rule2 FALSE
3: rule3 FALSE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer =  4

As can be seen from the answers, the given conditions are all met.

Test case 1 is worth a closer look. Here, the conditions id 1, 2, and 3 may be applicable but condition 1 precedes the other ones as it is the most specialised.

Expanding

This is to show that the solution can be expanded for more rules in a as well as for more conditions in b.

Here is an example with 7 conditions and 4 rule columns.

b4 <- fread(
  "rule1, rule2, rule3, rule4, result
TRUE,  TRUE,  TRUE,  TRUE,  1
TRUE,  TRUE,  NA,    NA,    2
NA,    TRUE,  TRUE,  NA,    3
NA,    FALSE, NA,    NA,    5
TRUE,  FALSE, NA,    NA,    6
FALSE, FALSE, NA, FALSE,    7
NA,    NA,    NA,    NA,    4"
)

The test code has been simplified to allow for a more compact view of the 16 text cases:

lb <- melt(b4[, id := .I], c("id", "result"), variable.name = "rule", value.name = "bool", na.rm = TRUE)[, nr := .N, by = id][]
test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), rule4 = c(TRUE, FALSE), sorted = FALSE)
sapply(
  seq(nrow(test)), 
  function(i) {
    a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool")
    answer <- lb[a, on = .(rule, bool), nomatch = 0L][, result[nr == .N], by = .(nr, id)][order(-nr, id), first(V1)]
    if (length(answer) == 0L) answer <- b4[id == max(id), result] # default from b
    return(answer)
  }
) %>% 
  cbind(test, .) %>% 
  setnames(".", "result") %>% 
  print()

It returns the table of test cases, i.e., different cases of a in wide format, with the result appended:

    rule1 rule2 rule3 rule4 result
 1:  TRUE  TRUE  TRUE  TRUE      1
 2:  TRUE  TRUE  TRUE FALSE      2
 3:  TRUE  TRUE FALSE  TRUE      2
 4:  TRUE  TRUE FALSE FALSE      2
 5:  TRUE FALSE  TRUE  TRUE      6
 6:  TRUE FALSE  TRUE FALSE      6
 7:  TRUE FALSE FALSE  TRUE      6
 8:  TRUE FALSE FALSE FALSE      6
 9: FALSE  TRUE  TRUE  TRUE      3
10: FALSE  TRUE  TRUE FALSE      3
11: FALSE  TRUE FALSE  TRUE      4
12: FALSE  TRUE FALSE FALSE      4
13: FALSE FALSE  TRUE  TRUE      5
14: FALSE FALSE  TRUE FALSE      7
15: FALSE FALSE FALSE  TRUE      5
16: FALSE FALSE FALSE FALSE      7
like image 158
Uwe Avatar answered Sep 21 '25 04:09

Uwe