How to compare if any of the elements in a row is same

Question

Is there a way to compare whether "any value of" a row is identical to "any value" of the row above -- regardless of the order? Below is a very random input data table.

DT <- data.table(A=c("a","a","b","d","e","f","h","i","j"),
                 B=c("a","b","c","c","f","g",NA,"j",NA),
                 C=c("a","b","c","b","g","h",NA,NA,NA))

> DT
   A  B  C
1: a  a  a
2: a  b  b
3: b  c  c
4: d  c  b
5: e  f  g
6: f  g  h
7: h NA NA
8: i  j NA
9: j NA NA

I would like to add a column D that compares a row with the row above, and compare whether any values of the two rows are identical (regardless of the order). So the desired output would be:

 > DT
   A  B  C  D
1: a  a  a  0 #No row above to compare; could be either NA or 0
2: a  b  b  1 #row 2 has "a", which is in row 1; returns 1
3: b  c  c  1 #row 3 has "b", which is in row 2; returns 1
4: d  c  b  1 #row 4 has "b" and "c", which are in row 3; returns 1
5: e  f  g  0 #row 5 has nothing that is in row 4; returns 0
6: f  g  h  1 #row 6 has "f" and "g", which are in row 5; returns 1
7: h NA NA  1 #row 7 has "h", which is in row 6; returns 1
8: i  j NA  0 #row 8 has nothing that is in row 7 (NA doesn't count)
9: j NA NA  1 #row 9 has "j", which is in row 8; returns 1 (NA doesn't count)

The main idea is that I would like to compare a row (or a vector) with another row (vector), and define two rows to be identical if any of the elements in each row (vector) are. (without reiterating to compare each element)

akrun · Accepted Answer

We can do this by getting the lead rows of the dataset, paste each row, check for any pattern in with the pasteed rows of original dataset using grepl and Map, then unlist and convert to integer

DT[, D := {
     v1 <- do.call(paste, .SD)
     v2 <- do.call(paste, c(shift(.SD, type = "lead"), sep="|"))
     v2N <- gsub("NA\|*|\|*NA", "", v2)
     v3 <- unlist(Map(grepl, v2N, v1), use.names = FALSE)
     as.integer(head(c(FALSE, v3), -1))        
}]

DT
#   A  B  C D
#1: a  a  a 0
#2: a  b  b 1
#3: b  c  c 1
#4: d  c  b 1
#5: e  f  g 0
#6: f  g  h 1
#7: h NA NA 1
#8: i  j NA 0
#9: j NA NA 1

Or we can do a split and do comparison using Map

as.integer(c(FALSE, unlist(Map(function(x,y) {
     x1 <- na.omit(unlist(x))
     y1 <- na.omit(unlist(y))
    any(x1 %in% y1 | y1 %in% x1)  },
     split(DT[-nrow(DT)], 1:(nrow(DT)-1)), split(DT[-1], 2:nrow(DT))), use.names = FALSE)))

lmo · Answer

Here is another method. It's probably not advisable on large data.tables as it uses by=1:nrow(DT) which tends to be quite slow.

DT[, D:= sign(DT[, c(.SD, shift(.SD))][,
   sum(!is.na(intersect(unlist(.SD[, .(A, B, C)]), unlist(.SD[, .(V4, V5, V6)])))),
   by=1:nrow(DT)]$V1)]

Here, [, c(.SD, shift(.SD))] creates a copy of the data.frame, with the lagged variables included (cbinded). Then the second chain intersects the unlisted variables in the original data.table and the shifted data.table. NAs are assigned 0 and non-NAs are assigned 1 and these results are summed. This operation occurs for each row of the copied data.table. The sum is extracted with $v1 and is turned into binary (0 and 1) using sign.

It returns

DT
   A  B  C D
1: a  a  a 0
2: a  b  b 1
3: b  c  c 1
4: d  c  b 1
5: e  f  g 0
6: f  g  h 1
7: h NA NA 1
8: i  j NA 0
9: j NA NA 1

How to compare if any of the elements in a row is same

Tags:

r

data.table

wyatt

2 Answers

akrun

lmo

Recent Activity

Donate For Us

How to compare if any of the elements in a row is same

Tags:

r

data.table

wyatt

2 Answers

akrun

lmo

Related questions

Recent Activity

Donate For Us