I'd like to merge two data frames where df2 overwrites any values that are NA or present in df1. Merge data frames and overwrite values provides a data.table option, but I'd like to know if there is a way to do this with dplyr. I've tried all of the _join options but none seem to do this. Is there a way to do this with dplyr?
Here is an example:
df1 <- data.frame(y = c("A", "B", "C", "D"), x1 = c(1,2,NA, 4))
df2 <- data.frame(y = c("A", "B", "C"), x1 = c(5, 6, 7))
Desired output:
y x1
1 A 5
2 B 6
3 C 7
4 D 4
I think what you want is to keep the values of df2 and only add the ones in df1 that are not present in df2 which is what anti_join does:
"anti_join return all rows from x where there are not matching values in y, keeping just columns from x."
My solution:
df3 <- anti_join(df1, df2, by = "y") %>% bind_rows(df2)
Warning messages:
1: In anti_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector
2: In rbind_all(x, .id) : Unequal factor levels: coercing to character
> df3
Source: local data frame [4 x 2]
y x1
(chr) (dbl)
1 D 4
2 A 5
3 B 6
4 C 7
this line gives the desired output (in a different order) but, you should pay attention to the warning message, when working with your dataset be sure to read y as a character variable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With