Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

No applicable method for 'anti_join' applied to an object of class "factor"

I want to Identify the rows present in dataframe1 which are not present in dataframe2 based on a particular column. I have used the below code to get the desired information.

diffId <- anti_join(dat$ID,datwe$ID)

Unfortunately, I have encountered with an error:

Error in UseMethod("anti_join") :
no applicable method for 'anti_join' applied to an object of class "factor"

I have checked the class of the desired column in both the dataframes and which turned out to be factor. Have also tried to separate the column into a separate variable in an assumption that it might solve the issue, but of no luck !

fac1 <- datwe$ID
fac2 <- dat$ID
diffId <- anti_join(fac2,fac1)

Could you please share your thoughts ?

Thanks

like image 837
Prradep Avatar asked Jun 04 '15 08:06

Prradep


1 Answers

Almost all dplyr functions operate on tbls (depending on the context it can be data.frame, data.table, database connection and so on) so what you really want is something like this:

> dat <- data.frame(ID=c(1, 3, 6, 4), x=runif(4))
> datwe <- data.frame(ID=c(3, 5, 8), y=runif(3))
> anti_join(dat, datwe, by='ID') %>% select(ID)
  ID
1  4
2  6
3  1

Note that ordering is clearly not preserved.

If you use factors (unlike numerics in the example above) with different levels there is a conversion between factor and character involved.

If you want to operate on vectors then you can use setdiff (available in both base and dplyr)

> setdiff(dat$ID, datwe$ID)
[1] 1 6 4
like image 101
zero323 Avatar answered Nov 15 '22 21:11

zero323