Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Non-greedy version of setdiff?

Tags:

r

Here's setdiff normal behaviour:

x <- rep(letters[1:4], 2)
x
# [1] "a" "b" "c" "d" "a" "b" "c" "d"

y <- letters[1:2]
y
# [1] "a" "b"

setdiff(x, y)
# [1] "c" "d"

… but what if I want y to be taken out only once, and therefore get the following result?

# "c" "d" "a" "b" "c" "d"

I'm guessing that there is an easy solution using either setdiff or %in%, but I just cannot see it.

like image 739
Fr. Avatar asked Jan 08 '17 10:01

Fr.


1 Answers

match returns a vector of the positions of (first) matches of its first argument in its second. It's used as an index constructor:

x[ -match(y,x) ]
 #[1] "c" "d" "a" "b" "c" "d"

If there are duplicates in 'y' and you want removal in proportion to their numbers therein, then the first thing that came to my mind is a for-loop:

y <- c("a","b","a")
x2 <- x
for( i in seq_along(y) ){ x2 <- x2[-match(y[i],x2)] }

> x2
[1] "c" "d" "b" "c" "d"

This would be one possible result of using the tabling approach suggested below. Uses some "set" functions, but this is not really a set problem. Seems somewhat more "vectorised":

c( table(x [x %in% intersect(x,y)]) - table(y[y %in% intersect(x,y)]) , 
   table( x[!x %in% intersect(x,y)]) )
a b c d 
0 1 2 2 
like image 169
IRTFM Avatar answered Nov 06 '22 21:11

IRTFM