Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent dplyr from joining on NA's

Tags:

r

dplyr

I'd like to do a full-join of 2 df's. To my surprise, dplyr's default behavior is to join on NA's if they exist in both df's. Is there a functionality to prevent dplyr from doing this?

Here's an example with inner join:

x <- data.frame(a = c(5, NA, 9), b = 1:3)
y <- data.frame(a = c(5, NA, 9), c = 4:6)
z <- dplyr::inner_join(x, y, by = 'a')

I would like z to contain only 2 records, not 3. Ideally, I want to do this without having to manually filter out the records with NA's beforehand and then append them to the result (since that seems clumsy).

like image 220
matsuo_basho Avatar asked Sep 11 '17 17:09

matsuo_basho


1 Answers

You can use na_matches = "never". This is in the NEWS for v. 0.7.0 but I don't see it in the documentation. The default is na_matches = "na".

This returns two rows instead of three:

dplyr::inner_join(x, y, by = 'a', na_matches = "never")

  a b c
1 5 1 4
2 9 3 6
like image 166
aosmith Avatar answered Oct 26 '22 01:10

aosmith