I have two data frames (A and B), both with a column 'C'. I want to check if values in column 'C' in data frame A exists in data frame B. <pre class="prettyprint"><code>A = data.frame(C = c(1,2,3,4)) B = data.frame(C = c(1,3,4,7)) </code></pre>

Use <code>%in%</code> as follows <pre class="prettyprint"><code>A$C %in% B$C </code></pre> Which will tell you which values of column C of A are in B. What is returned is a logical vector. In the specific case of your example, you get: <pre class="prettyprint"><code>A$C %in% B$C # [1] TRUE FALSE TRUE TRUE </code></pre> Which you can use as an index to the rows of <code>A</code> or as an index to <code>A$C</code> to get the actual values: <pre class="prettyprint"><code># as a row index A[A$C %in% B$C, ] # note the comma to indicate we are indexing rows # as an index to A$C A$C[A$C %in% B$C] [1] 1 3 4 # returns all values of A$C that are in B$C </code></pre> We can negate it too: <pre class="prettyprint"><code>A$C[!A$C %in% B$C] [1] 2 # returns all values of A$C that are NOT in B$C </code></pre> <hr> If you want to know if a specific value is in B$C, use the same function: <pre class="prettyprint"><code> 2 %in% B$C # "is the value 2 in B$C ?" # FALSE A$C[2] %in% B$C # "is the 2nd element of A$C in B$C ?" # FALSE </code></pre>

Check whether values in one data frame column exist in a second data frame

Tags:

dataframe

r

I have two data frames (A and B), both with a column 'C'. I want to check if values in column 'C' in data frame A exists in data frame B.

A = data.frame(C = c(1,2,3,4)) B = data.frame(C = c(1,3,4,7))

533

asked Dec 08 '12 05:12

user1631306

1 Answers

Use %in% as follows

A$C %in% B$C

Which will tell you which values of column C of A are in B.

What is returned is a logical vector. In the specific case of your example, you get:

A$C %in% B$C # [1]  TRUE FALSE  TRUE  TRUE

Which you can use as an index to the rows of A or as an index to A$C to get the actual values:

# as a row index A[A$C %in% B$C,  ]  # note the comma to indicate we are indexing rows  # as an index to A$C A$C[A$C %in% B$C] [1] 1 3 4  # returns all values of A$C that are in B$C

We can negate it too:

A$C[!A$C %in% B$C] [1] 2   # returns all values of A$C that are NOT in B$C

If you want to know if a specific value is in B$C, use the same function:

  2 %in% B$C   # "is the value 2 in B$C ?"     # FALSE    A$C[2] %in% B$C  # "is the 2nd element of A$C in B$C ?"     # FALSE

155

answered Sep 24 '22 15:09

Ricardo Saporta

Related questions
                            
                                Why are there two assignment operators, `<-` and `->` in R?
                            
                                Extract pvalue from glm
                            
                                Saving a JSON object to file.JSON
                            
                                R color scatter plot points based on values
                            
                                Override column types when importing data using readr::read_csv() when there are many columns
                            
                                How to set use ggplot2 to map a raster
                            
                                No visible binding for global variable Note in R CMD check
                            
                                In ggplot2, what do the end of the boxplot lines represent?
                            
                                Display a time clock in the R command line
                            
                                preventing a chunk run in rmarkdown
                            
                                Non-redundant version of expand.grid
                            
                                Generating Random Dates
                            
                                Remove parenthesis from a character string
                            
                                convert date to unix time in R
                            
                                Saving leaflet output as html
                            
                                Overlap join with start and end positions
                            
                                R, dplyr - combination of group_by() and arrange() does not produce expected result?
                            
                                How to define fixed aspect-ratio for scatter-plot
                            
                                Multirow axis labels with nested grouping variables
                            
                                Why do vector indices in R start with 1, instead of 0? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With