Idea is to have manageable approach to define rules from some table:
library(data.table)
a <- data.table(rule = c("rule1", "rule2", "rule3"),
bool = c(T,T,F))
a
# rule bool
# 1: rule1 TRUE
# 2: rule2 TRUE
# 3: rule3 FALSE
ifelse(a[rule == "rule1", bool] & a[rule == "rule2", bool] & a[rule == "rule3", bool], 1,
ifelse(a[rule == "rule1", bool] & a[rule == "rule2", bool], 2,
ifelse(a[rule == "rule2", bool] & a[rule == "rule3", bool], 3, 4)))
# [1] 2
Obviously this is not very sustainable or readable as I keep adding rules.
What would an alternative to ifelse
here?
This is a very interesting problem, in particular as the conditions do not always involve all rows of a
, i.e., rule1
, rule2
, and rule3
.
I have tried to find a general solution which can be expanded for an arbitrary number of conditions as well as for additional rows in a
.
The main idea is to replace the conditions in the nested ifelse()
or case_when()
statements, resp., by a data.table
which then can be joined with a
somehow:
library(data.table)
b <- fread(
"rule1, rule2, rule3, result
TRUE, TRUE, TRUE, 1
TRUE, TRUE, NA, 2
NA, TRUE, TRUE, 3
NA, NA, NA, 4"
)
E.g., the condition in row 2 specifies to return 2
if rule1
and rule2
both are TRUE
while the value of rule3
does not matter and can be ignored as a wildcard.
It is important to note that the order of conditions matters: First, the conditions without any wildcard must be checked. Then, the conditions with one wildcard, and so on. Finally, if no other match is found the default value is applied (all wildcards). The default value must always be given in the last row.
So, the most specialised conditions come first and the most general last.
The OP already has given the test data a
in long format:
rule bool 1: rule1 TRUE 2: rule2 TRUE 3: rule3 FALSE
Therefore, also the conditions b
are reshaped to long format:
lb <- melt(b[, id := .I], c("id", "result"), variable.name = "rule", value.name = "bool", na.rm = TRUE)[
, nr := .N, by = id][]
lb
id result rule bool nr 1: 1 1 rule1 TRUE 3 2: 2 2 rule1 TRUE 2 3: 1 1 rule2 TRUE 3 4: 2 2 rule2 TRUE 2 5: 3 3 rule2 TRUE 2 6: 1 1 rule3 TRUE 3 7: 3 3 rule3 TRUE 2
Before reshaping, a row id
has been added which indicates the order of conditions. Wildcards are omitted from the long format as they are not needed for the joins. After reshaping, the number of remaining rows nr
per id
is appended, i.e., the number of non-wildcard entries.
Now, the conditions are tested:
answer <- lb[a, on = .(rule, bool), nomatch = 0L][
, result[nr == .N], by = .(nr, id)][
order(-nr, id), first(V1)]
if (length(answer) == 0L) answer <- b[id == max(id), result] # default
answer
This happens in four steps:
a
and lb
are joined (inner join) on rule
and bool
,id
(nr
is included in the by =
clause just for convenience as it is required in the next step),result
from the most specialised condition,answer
, the default value is returned.For the given a
, above code returns
answer
[1] 2
To verify above code is working correctly, a more thorough test is required
test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), sorted = FALSE)
test
rule1 rule2 rule3 1: TRUE TRUE TRUE 2: TRUE TRUE FALSE 3: TRUE FALSE TRUE 4: TRUE FALSE FALSE 5: FALSE TRUE TRUE 6: FALSE TRUE FALSE 7: FALSE FALSE TRUE 8: FALSE FALSE FALSE
Each row represents a version of a
which is turned into OP's long format by
a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool")
By looping over i
, all possible combinations of TRUE
/FALSE
values can be tested. In addition some intermediate results are printed which help to understand the workings:
library(magrittr) # piping used here to improve readability
test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), sorted = FALSE)
for (i in seq(nrow(test))) {
cat("test case", i, "\n")
a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool") %T>%
print()
lb[a, on = .(rule, bool), nomatch = 0L][, result[nr == .N], keyby = .(nr, id)] %>%
unique() %>%
print() # intermediate result printed for illustration
answer <- lb[a, on = .(rule, bool), nomatch = 0L][
, result[nr == .N], by = .(nr, id)][
order(-nr, id), first(V1)]
if (length(answer) == 0L) answer <- b[id == max(id), result] # default from b
cat("answer = ", answer, "\n\n")
}
test case 1 rule bool 1: rule1 TRUE 2: rule2 TRUE 3: rule3 TRUE nr id V1 1: 2 2 2 2: 2 3 3 3: 3 1 1 answer = 1 test case 2 rule bool 1: rule1 TRUE 2: rule2 TRUE 3: rule3 FALSE nr id V1 1: 2 2 2 answer = 2 test case 3 rule bool 1: rule1 TRUE 2: rule2 FALSE 3: rule3 TRUE Empty data.table (0 rows and 3 cols): nr,id,V1 answer = 4 test case 4 rule bool 1: rule1 TRUE 2: rule2 FALSE 3: rule3 FALSE Empty data.table (0 rows and 3 cols): nr,id,V1 answer = 4 test case 5 rule bool 1: rule1 FALSE 2: rule2 TRUE 3: rule3 TRUE nr id V1 1: 2 3 3 answer = 3 test case 6 rule bool 1: rule1 FALSE 2: rule2 TRUE 3: rule3 FALSE Empty data.table (0 rows and 3 cols): nr,id,V1 answer = 4 test case 7 rule bool 1: rule1 FALSE 2: rule2 FALSE 3: rule3 TRUE Empty data.table (0 rows and 3 cols): nr,id,V1 answer = 4 test case 8 rule bool 1: rule1 FALSE 2: rule2 FALSE 3: rule3 FALSE Empty data.table (0 rows and 3 cols): nr,id,V1 answer = 4
As can be seen from the answers, the given conditions are all met.
Test case 1 is worth a closer look. Here, the conditions id
1, 2, and 3 may be applicable but condition 1 precedes the other ones as it is the most specialised.
This is to show that the solution can be expanded for more rules in a
as well as for more conditions in b
.
Here is an example with 7 conditions and 4 rule columns.
b4 <- fread(
"rule1, rule2, rule3, rule4, result
TRUE, TRUE, TRUE, TRUE, 1
TRUE, TRUE, NA, NA, 2
NA, TRUE, TRUE, NA, 3
NA, FALSE, NA, NA, 5
TRUE, FALSE, NA, NA, 6
FALSE, FALSE, NA, FALSE, 7
NA, NA, NA, NA, 4"
)
The test code has been simplified to allow for a more compact view of the 16 text cases:
lb <- melt(b4[, id := .I], c("id", "result"), variable.name = "rule", value.name = "bool", na.rm = TRUE)[, nr := .N, by = id][]
test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), rule4 = c(TRUE, FALSE), sorted = FALSE)
sapply(
seq(nrow(test)),
function(i) {
a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool")
answer <- lb[a, on = .(rule, bool), nomatch = 0L][, result[nr == .N], by = .(nr, id)][order(-nr, id), first(V1)]
if (length(answer) == 0L) answer <- b4[id == max(id), result] # default from b
return(answer)
}
) %>%
cbind(test, .) %>%
setnames(".", "result") %>%
print()
It returns the table of test cases, i.e., different cases of a
in wide format, with the result appended:
rule1 rule2 rule3 rule4 result 1: TRUE TRUE TRUE TRUE 1 2: TRUE TRUE TRUE FALSE 2 3: TRUE TRUE FALSE TRUE 2 4: TRUE TRUE FALSE FALSE 2 5: TRUE FALSE TRUE TRUE 6 6: TRUE FALSE TRUE FALSE 6 7: TRUE FALSE FALSE TRUE 6 8: TRUE FALSE FALSE FALSE 6 9: FALSE TRUE TRUE TRUE 3 10: FALSE TRUE TRUE FALSE 3 11: FALSE TRUE FALSE TRUE 4 12: FALSE TRUE FALSE FALSE 4 13: FALSE FALSE TRUE TRUE 5 14: FALSE FALSE TRUE FALSE 7 15: FALSE FALSE FALSE TRUE 5 16: FALSE FALSE FALSE FALSE 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With