I'm trying to identify the values in a data frame that do not match, but can't figure out how to do this.
# make data frame
a <- data.frame( x = c(1,2,3,4))
b <- data.frame( y = c(1,2,3,4,5,6))
# select only values from b that are not in 'a'
# attempt 1:
results1 <- b$y[ !a$x ]
# attempt 2:
results2 <- b[b$y != a$x,]
If a = c(1,2,3)
this works, as a
is a multiple of b
. However, I'm trying to just select all the values from data frame y
, that are not in x
, and don't understand what function to use.
To select a specific column, you can also type in the name of the dataframe, followed by a $ , and then the name of the column you are looking to select. In this example, we will be selecting the payment column of the dataframe. When running this script, R will simplify the result as a vector.
You can use the following basic syntax to select all elements that are not in a list of values in R: ! (data %in% c(value1, value2, value3, ...))
If I understand correctly, you need the negation of the %in%
operator. Something like this should work:
subset(b, !(y %in% a$x))
> subset(b, !(y %in% a$x))
y
5 5
6 6
Try the set difference function setdiff
. So you would have
results1 = setdiff(a$x, b$y) # elements in a$x NOT in b$y
results2 = setdiff(b$y, a$x) # elements in b$y NOT in a$x
You could also use dplyr
for this task. To find what is in b
but not a
:
library(dplyr)
anti_join(b, a, by = c("y" = "x"))
# y
# 1 5
# 2 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With