I'm trying to identify the values in a data frame that do not match, but can't figure out how to do this. <pre class="prettyprint"><code># make data frame a <- data.frame( x = c(1,2,3,4)) b <- data.frame( y = c(1,2,3,4,5,6)) # select only values from b that are not in 'a' # attempt 1: results1 <- b$y[ !a$x ] # attempt 2: results2 <- b[b$y != a$x,] </code></pre> If <code>a = c(1,2,3)</code> this works, as <code>a</code> is a multiple of <code>b</code>. However, I'm trying to just select all the values from data frame <code>y</code>, that are not in <code>x</code>, and don't understand what function to use.

If I understand correctly, you need the negation of the <code>%in%</code> operator. Something like this should work: <code>subset(b, !(y %in% a$x))</code> <pre class="prettyprint"><code>> subset(b, !(y %in% a$x)) y 5 5 6 6 </code></pre>

Try the set difference function <code>setdiff</code>. So you would have <pre class="prettyprint"><code>results1 = setdiff(a$x, b$y) # elements in a$x NOT in b$y results2 = setdiff(b$y, a$x) # elements in b$y NOT in a$x </code></pre>

You could also use <code>dplyr</code> for this task. To find what is in <code>b</code> but not <code>a</code>: <pre class="prettyprint"><code>library(dplyr) anti_join(b, a, by = c("y" = "x")) # y # 1 5 # 2 6 </code></pre>

How I can select rows from a dataframe that do not match?

Tags:

dataframe

r

I'm trying to identify the values in a data frame that do not match, but can't figure out how to do this.

# make data frame 
a <- data.frame( x =  c(1,2,3,4)) 
b <- data.frame( y =  c(1,2,3,4,5,6))

# select only values from b that are not in 'a'
# attempt 1: 
results1 <- b$y[ !a$x ]

# attempt 2:  
results2 <- b[b$y != a$x,]

If a = c(1,2,3) this works, as a is a multiple of b. However, I'm trying to just select all the values from data frame y, that are not in x, and don't understand what function to use.

806

asked Apr 28 '11 01:04

djq

3 Answers

If I understand correctly, you need the negation of the %in% operator. Something like this should work:

subset(b, !(y %in% a$x))

> subset(b, !(y %in% a$x))
  y
5 5
6 6

answered Oct 14 '22 09:10

Chase

Try the set difference function setdiff. So you would have

results1 = setdiff(a$x, b$y)   # elements in a$x NOT in b$y
results2 = setdiff(b$y, a$x)   # elements in b$y NOT in a$x

answered Oct 14 '22 08:10

Ramnath

You could also use dplyr for this task. To find what is in b but not a:

library(dplyr)    
anti_join(b, a, by = c("y" = "x"))

#   y
# 1 5
# 2 6

answered Oct 14 '22 09:10

Joe

Related questions
                            
                                How to tell lapply to ignore an error and process the next thing in the list?
                            
                                How to get help in R?
                            
                                How to call a function using the character string of the function name in R?
                            
                                Getting frequency values from histogram in R
                            
                                How to remove rows with inf from a dataframe in R
                            
                                Extracting text data from PDF files
                            
                                Examples of the perils of globals in R and Stata
                            
                                Pretty ticks for log normal scale using ggplot2 (dynamic not manual)
                            
                                Vectorized IF statement in R?
                            
                                Hiding NA's when printing a dataframe in knitr
                            
                                Creating a sequential list of letters with R
                            
                                Calling R Function from C++
                            
                                Adding a company Logo to ShinyDashboard header
                            
                                How do I read a Parquet in R and convert it to an R DataFrame?
                            
                                Calculate cumsum() while ignoring NA values
                            
                                Random sample of character vector, without elements prefixing one another
                            
                                Create end of the month date from a date variable
                            
                                jupyter-client has to be installed but “jupyter kernelspec --version” exited with code 127
                            
                                dplyr: put count occurrences into new variable [duplicate]
                            
                                rename the columns name after cbind the data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With