My question concerns the practical difference between the ==
and %in%
operators in R.
I have run into an instance at work where filtering with either operator gives different results (e.g. one results on 800 rows, and the other 1200). I have run into this problem in the past and am able to validate in a way that ensures I get the results I desire. However, I am still stumped regarding how they are different.
Can someone please shed some light on how these operators are different?
What is the Difference Between the == and %in% Operators in R. The %in% operator is used for matching values. “returns a vector of the positions of (first) matches of its first argument in its second”. On the other hand, the == operator, is a logical operator and is used to compare if two elements are exactly equal.
The Equality Operator == Relational operators, or comparators, are operators which help us see how one R object relates to another. For example, you can check whether two objects are equal (equality) by using a double equals sign == .
The result of the %% operator is the REMAINDER of a division, Eg. 75%%4 = 3. I noticed if the dividend is lower than the divisor, then R returns the same dividend value.
%in%
is value matching and "returns a vector of the positions of (first) matches of its first argument in its second" (See help('%in%')
) This means you could compare vectors of different lengths to see if elements of one vector match at least one element in another. The length of output will be equal to the length of the vector being compared (the first one).
1:2 %in% rep(1:2,5)
#[1] TRUE TRUE
rep(1:2,5) %in% 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#Note this output is longer in second
==
is logical operator meant to compare if two things are exactly equal. If the vectors are of equal length, elements will be compared element-wise. If not, vectors will be recycled. The length of output will be equal to the length of the longer vector.
1:2 == rep(1:2,5)
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
rep(1:2,5) == 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
1:10 %in% 3:7
#[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
#is same as
sapply(1:10, function(a) any(a == 3:7))
#[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
NOTE: If possible, try to use identical
or all.equal
instead of ==
and.
Given two vectors, x and y, the code x == y will compare the first element of x with the first element of y, then the second element of x with the second element of y, and so on. When using x == y, the lengths of x and y must be the same. Here, compare means "is equal to" and therefore the output is a logical vector equal to the length of x (or y).
In the code x %in% y, the first element of x is compared to all elements in y, then the second element of x is compared to all elements of y, and so on. Here, compare means "is the current element of x equal to any value in y" and therefore the output is a logical vector that has the same length of x and not (necessarily) y.
Here is a code snippet illustrating the difference. Note that x and y have the same lengths but the elements of y are the elements of x in different order. Note too in the final examples that x is a 3-element vector being compared to the letters vector, which contains 26 elements.
> x <- c('a','b','c')
> y <- c('c', 'b', 'a')
> x == y
[1] FALSE TRUE FALSE
> x %in% y
[1] TRUE TRUE TRUE
> x %in% letters
[1] TRUE TRUE TRUE
> letters %in% x
[1] TRUE TRUE TRUE FALSE FALSE FALSE
[7] FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE
[19] FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With