I am trying to create a variable that is a logical value when comparing one character string to more than two other character strings in a data.table and I need to ignore NA's.
Sample data for D2:
structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1",
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1",
"char1", "char1")), row.names = c(NA, -3L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000015eb1261ef0>)
Attempted a proposed solution below:
D2[, Match := apply(sapply(.SD, `==`, D2[, "var1"]), 1, any), .SDcols =
c("var2", "var3")]
Result for a003 is TRUE whereas it should be FALSE because var1 and var3 don't match:
structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1",
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1",
"char1", "char1"), Match = c(TRUE, TRUE, TRUE)), row.names = c(NA,
-3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer:
0x0000015eb1261ef0>)
Desired Result:
structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1",
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1",
"char1", "char1"), Match = c(TRUE, TRUE, FALSE)), row.names = c(NA,
-3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer:
0x0000015eb1261ef0>)
How about the following
setDT(D1)
D1[, Match := apply(sapply(.SD, `==`, D1[, "var1"]), 1, any), .SDcols = c("var2", "var3")]
D1
#ID var1 var2 var3 Match
#1: a001 char1 char1 char1 TRUE
#2: a002 char1 <NA> char1 TRUE
#3: a003 char2 char1 char1 FALSE
Explanation: We compare entries in the sub-data.table
defined through .SDcols
with entries in D1[, "var1"]
; if there is any
match, return TRUE
, else FALSE
.
In response to your comment, you can do
setDT(D1)
D1[, Match := apply(sapply(.SD, `==`, D1[, "var1"]), 1, all, na.rm = T), .SDcols = c("var2", "var3")]
Another option is:
D2[, m := Reduce(`&`, data.table(
sweep(as.matrix(.SD[, -"var1"]), 1L, .SD[["var1"]], function(x, y) is.na(x) | x==y))),
.SDcols=var1:var3]
Or using melt
:
D2[, m := melt(D2, id.vars=c("ID", "var1"))[, any(var1==value, na.rm=TRUE), ID]$V1]
data:
library(data.table)
D2 <- structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1",
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1",
"char1", "char1")), row.names = c(NA, -3L), class = c("data.table",
"data.frame"))
setDT(D2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With