Difference between the == and %in% operators in R [duplicate]

Tags:

My question concerns the practical difference between the == and %in% operators in R.

I have run into an instance at work where filtering with either operator gives different results (e.g. one results on 800 rows, and the other 1200). I have run into this problem in the past and am able to validate in a way that ensures I get the results I desire. However, I am still stumped regarding how they are different.

Can someone please shed some light on how these operators are different?

540

asked Mar 06 '17 22:03

R_user1233

2 Answers

%in% is value matching and "returns a vector of the positions of (first) matches of its first argument in its second" (See help('%in%')) This means you could compare vectors of different lengths to see if elements of one vector match at least one element in another. The length of output will be equal to the length of the vector being compared (the first one).

1:2 %in% rep(1:2,5)
#[1] TRUE TRUE

rep(1:2,5) %in% 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

#Note this output is longer in second

== is logical operator meant to compare if two things are exactly equal. If the vectors are of equal length, elements will be compared element-wise. If not, vectors will be recycled. The length of output will be equal to the length of the longer vector.

1:2 == rep(1:2,5)
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

rep(1:2,5) == 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

1:10 %in% 3:7
#[1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

#is same as 

sapply(1:10, function(a) any(a == 3:7))
#[1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

NOTE: If possible, try to use identical or all.equal instead of == and.

answered Oct 11 '22 19:10

d.b

Given two vectors, x and y, the code x == y will compare the first element of x with the first element of y, then the second element of x with the second element of y, and so on. When using x == y, the lengths of x and y must be the same. Here, compare means "is equal to" and therefore the output is a logical vector equal to the length of x (or y).

In the code x %in% y, the first element of x is compared to all elements in y, then the second element of x is compared to all elements of y, and so on. Here, compare means "is the current element of x equal to any value in y" and therefore the output is a logical vector that has the same length of x and not (necessarily) y.

Here is a code snippet illustrating the difference. Note that x and y have the same lengths but the elements of y are the elements of x in different order. Note too in the final examples that x is a 3-element vector being compared to the letters vector, which contains 26 elements.

> x <- c('a','b','c')
> y <- c('c', 'b', 'a')
> x == y
[1] FALSE  TRUE FALSE

> x %in% y
[1] TRUE TRUE TRUE

> x %in% letters
[1] TRUE TRUE TRUE

> letters %in% x
 [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE
 [7] FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE
[19] FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE

answered Oct 11 '22 19:10

Jaguar

Related questions
                            
                                Getting a map with points, using ggmap and ggplot2
                            
                                Ifelse() with three conditions
                            
                                R - Scaling numeric values only in a dataframe with mixed types
                            
                                How to convert the name of a dataframe to a string in R?
                            
                                Complicated reshaping
                            
                                Convert hours:minutes:seconds to minutes
                            
                                Line breaks in R Markdown text (not code blocks)
                            
                                How can I prevent a library from masking functions
                            
                                How to replace empty string with NA in R dataframe?
                            
                                Sort data frame column by factor
                            
                                Three dimensional array to list
                            
                                How do I combine aes() and aes_string() options
                            
                                rmarkdown error "attempt to use zero-length variable name"
                            
                                More efficient R / Sweave / TeXShop work-flow?
                            
                                How do I add the mean value to a histogram in R?
                            
                                Read csv from specific row
                            
                                How do I generate a histogram for each column of my table?
                            
                                Add missing value in column with value from row above
                            
                                Joining aggregated values back to the original data frame [duplicate]
                            
                                How to fill NAs with LOCF by factors in data frame, split by country

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between the == and %in% operators in R [duplicate]

Tags:

operators

r

filtering

R_user1233

People also ask

2 Answers

d.b

Jaguar

Recent Activity

Donate For Us