This is a bit more complicated that the title lets on, and I'm sure if I could think of a way to better describe it, I could google it better.
I have data that looks like this:
SET ID
100301006 1287025
100301006 1287026
100301010 1287027
100301013 1287030
100301011 1287027
and I would like to identify and select those rows where each both values in a row have a unique value for the column. In the example above, I want to grab only the row:
100301013 1287030
I don't want SET
100301006
, since it matches to 2 different records in the ID field (1287025
and 1287026
). Similarly, I don't want SET 100301010
since the ID
record it matches to (1287027
) can also match another SET (10030011
).
In some cases there could be more than 2 matches.
I could do this in loops, but that seems like a hack. I'd love a base R or data.table solution, but I'm not so interested in dplyr (trying to minimize dependencies).
We can use duplicated
on each columns independently to create a list
of logical vector
s, Reduce
it to a single vector with &
and use that to subset the rows of the dataset
df1[Reduce(`&`, lapply(df1, function(x)
!(duplicated(x)|duplicated(x, fromLast = TRUE)))),]
# SET ID
#4 100301013 1287030
Or as @chinsoon12 suggested
m1 <- sapply(df1, function(x) !(duplicated(x)| duplicated(x, fromLast = TRUE)))
df1[rowSums(m1) == ncol(m1),, drop = FALSE]
df1 <- structure(list(SET = c(100301006L, 100301006L, 100301010L, 100301013L,
100301011L), ID = c(1287025L, 1287026L, 1287027L, 1287030L, 1287027L
)), class = "data.frame", row.names = c(NA, -5L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With