Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected behavior for setdiff() function in R

Tags:

r

As I understand it, setdiff() compares two vectors and gives the elements that occur in one vector but do not occur in the other. If that's so, then given these vectors...

thing1 <- c(1,2,3)
thing2 <- c(2,3,4)
thing3 <- c(1,2,3)

...here's my results.

setdiff(thing1,thing2)
> [1] 1

setdiff(thing2,thing3)
> [1] 4

setdiff(thing1,thing3)
> numeric(0)

Shouldn't the comparison of thing1 and thing2 produce the same result as comparing thing2 and thing3? How to achieve an 'outer join' sort of result (symmetric set difference) where we can see all the elements that are missing if we unioned thing1 and thing2? Prefer to know functionality in R base, but would also appreciate data.tables approach. Thanks in advance.

like image 206
strongbad03 Avatar asked May 31 '16 14:05

strongbad03


1 Answers

setdiff provides asymmetric difference. In this case, it does what it says on the tin.

Shouldn't the comparison of thing1 and thing2 produce the same result as comparing thing2 and thing3?

Well, no. But it will produce the same results as comparing thing3 and thing2. The order matters. Consider your first two examples:

The first example asks, what is in thing1 that is not in thing2?

> setdiff(thing1, thing2)
[1] 1

You could try the reverse, what is in thing2 that is not in thing1?

> setdiff(thing2, thing1)
[1] 4

But it looks to me like the question you're asking is:

What elements of thing1 and thing2 are not shared?

Which is the same as:

What elements are in the union of thing1 and thing2, but not in the intersection of the two?

> setdiff(union(thing1, thing2), intersect(thing1, thing2))
[1] 1 4
like image 170
sebastian-c Avatar answered Oct 20 '22 15:10

sebastian-c