Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicates across multiple vectors

Tags:

r

duplicates

I want to remove all duplicates across multiple vectors, leaving none. For example, for these vectors:

a <- c("dog", "fish", "cow")
b <- c("dog", "horse", "mouse")
c <- c("cat", "sheep", "mouse")

the expected result would be:

a <- c("fish", "cow")
b <- c("horse")
c <- c("cat", "sheep")

Is there a way to achieve this without concatenating the vectors and splitting them again?

like image 596
Ben Avatar asked Feb 16 '26 07:02

Ben


2 Answers

You could perhaps do:

vec <- c(a, b, c)
sapply(list(a, b, c), function(x) x[!x %in% vec[duplicated(vec)]])

[[1]]
[1] "fish" "cow" 

[[2]]
[1] "horse"

[[3]]
[1] "cat"   "sheep"

If you need individual variables in the global environment, with the addition of lst() from tibble:

vec <- c(a, b, c)
l <- sapply(lst(a, b, c), function(x) x[!x %in% vec[duplicated(vec)]])
list2env(l, envir = .GlobalEnv)
like image 103
tmfmnk Avatar answered Feb 19 '26 22:02

tmfmnk


Given data in a list, e.g., lst <- list(a = a, b = b, c = c), you can try

  • Option 1
> unstack(subset(stack(lst), ave(seq_along(values), values, FUN = length) == 1))
$a
[1] "fish" "cow"

$b
[1] "horse"

$c
[1] "cat"   "sheep"
  • Option 2
> lapply(seq_along(lst), \(k) setdiff(lst[[k]], unlist(lst[-k])))
[[1]]
[1] "fish" "cow"

[[2]]
[1] "horse"

[[3]]
[1] "cat"   "sheep"
  • Option 3
> v <- names(which(table(unlist(lst)) == 1))

> lapply(lst, intersect, v)
$a
[1] "fish" "cow"

$b
[1] "horse"

$c
[1] "cat"   "sheep"
like image 26
ThomasIsCoding Avatar answered Feb 19 '26 22:02

ThomasIsCoding