Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Compare character strings across multiple columns to character string in a single column by row

I am trying to create a variable that is a logical value when comparing one character string to more than two other character strings in a data.table and I need to ignore NA's.

Sample data for D2:

structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1", 
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", 
"char1", "char1")), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000015eb1261ef0>)

Attempted a proposed solution below:

D2[, Match := apply(sapply(.SD, `==`, D2[, "var1"]), 1, any), .SDcols = 
c("var2", "var3")]

Result for a003 is TRUE whereas it should be FALSE because var1 and var3 don't match:

structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1", 
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", 
"char1", "char1"), Match = c(TRUE, TRUE, TRUE)), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 
0x0000015eb1261ef0>)

Desired Result:

structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1", 
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", 
"char1", "char1"), Match = c(TRUE, TRUE, FALSE)), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 
0x0000015eb1261ef0>)
like image 260
user3594490 Avatar asked Sep 17 '19 22:09

user3594490


2 Answers

How about the following

setDT(D1)
D1[, Match := apply(sapply(.SD, `==`, D1[, "var1"]), 1, any), .SDcols = c("var2", "var3")]
D1
#ID  var1  var2  var3 Match
#1: a001 char1 char1 char1  TRUE
#2: a002 char1  <NA> char1  TRUE
#3: a003 char2 char1 char1 FALSE

Explanation: We compare entries in the sub-data.table defined through .SDcols with entries in D1[, "var1"]; if there is any match, return TRUE, else FALSE.


Update

In response to your comment, you can do

setDT(D1)
D1[, Match := apply(sapply(.SD, `==`, D1[, "var1"]), 1, all, na.rm = T), .SDcols = c("var2", "var3")]
like image 120
Maurits Evers Avatar answered Nov 08 '22 15:11

Maurits Evers


Another option is:

D2[, m := Reduce(`&`, data.table(
        sweep(as.matrix(.SD[, -"var1"]), 1L, .SD[["var1"]], function(x, y) is.na(x) | x==y))), 
    .SDcols=var1:var3]

Or using melt:

D2[, m := melt(D2, id.vars=c("ID", "var1"))[, any(var1==value, na.rm=TRUE), ID]$V1]

data:

library(data.table)
D2 <- structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1", 
    "char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", 
        "char1", "char1")), row.names = c(NA, -3L), class = c("data.table", 
            "data.frame"))
setDT(D2)
like image 33
chinsoon12 Avatar answered Nov 08 '22 15:11

chinsoon12