Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How I can select rows from a dataframe that do not match?

Tags:

dataframe

r

I'm trying to identify the values in a data frame that do not match, but can't figure out how to do this.

# make data frame 
a <- data.frame( x =  c(1,2,3,4)) 
b <- data.frame( y =  c(1,2,3,4,5,6))

# select only values from b that are not in 'a'
# attempt 1: 
results1 <- b$y[ !a$x ]

# attempt 2:  
results2 <- b[b$y != a$x,]

If a = c(1,2,3) this works, as a is a multiple of b. However, I'm trying to just select all the values from data frame y, that are not in x, and don't understand what function to use.

like image 806
djq Avatar asked Apr 28 '11 01:04

djq


People also ask

How do I select specific rows and columns from a Dataframe in R?

To select a specific column, you can also type in the name of the dataframe, followed by a $ , and then the name of the column you are looking to select. In this example, we will be selecting the payment column of the dataframe. When running this script, R will simplify the result as a vector.

How do I select not in R?

You can use the following basic syntax to select all elements that are not in a list of values in R: ! (data %in% c(value1, value2, value3, ...))


3 Answers

If I understand correctly, you need the negation of the %in% operator. Something like this should work:

subset(b, !(y %in% a$x))

> subset(b, !(y %in% a$x))
  y
5 5
6 6
like image 67
Chase Avatar answered Oct 14 '22 09:10

Chase


Try the set difference function setdiff. So you would have

results1 = setdiff(a$x, b$y)   # elements in a$x NOT in b$y
results2 = setdiff(b$y, a$x)   # elements in b$y NOT in a$x
like image 28
Ramnath Avatar answered Oct 14 '22 08:10

Ramnath


You could also use dplyr for this task. To find what is in b but not a:

library(dplyr)    
anti_join(b, a, by = c("y" = "x"))

#   y
# 1 5
# 2 6
like image 25
Joe Avatar answered Oct 14 '22 09:10

Joe