Can I replace NAs when joining two data frames with dplyr?

Tags:

dplyr

I would like to join two data frames. Some of the column names overlap, and there are NA entries in one of the data frame's overlapping columns. Here is a simplified example:

df1 <- data.frame(fruit = c('apples','oranges','bananas','grapes'), var1 = c(1,2,3,4), var2 = c(3,NA,6,NA), stringsAsFactors = FALSE)
df2 <- data.frame(fruit = c('oranges','grapes'), var2=c(5,6), var3=c(7,8), stringsAsFactors = FALSE)

Can I use dplyr join functions to join these data frames and automatically prioritize the non-NA entry so that I get the "var2" column to have no NA entries in the joined data frame? As it is now, if I call left_join, it keeps the NA entries, and if I call full_join it duplicates the rows.

Example Data

> df1
    fruit var1 var2
1  apples    1    3
2 oranges    2   NA
3 bananas    3    6
4  grapes    4   NA
> df2
    fruit var2 var3
1 oranges    5    7
2  grapes    6    8

595

asked Aug 23 '16 20:08

qdread

2 Answers

coalesce might be something you need. It fills the NA from the first vector with values from the second vector at corresponding positions:

library(dplyr)
df1 %>% 
        left_join(df2, by = "fruit") %>% 
        mutate(var2 = coalesce(var2.x, var2.y)) %>% 
        select(-var2.x, -var2.y)

#     fruit var1 var3 var2
# 1  apples    1   NA    3
# 2 oranges    2    7    5
# 3 bananas    3   NA    6
# 4  grapes    4    8    6

Or use data.table, which does in-place replacing:

library(data.table)
setDT(df1)[setDT(df2), on = "fruit", `:=` (var2 = i.var2, var3 = i.var3)]
df1
#      fruit var1 var2 var3
# 1:  apples    1    3   NA
# 2: oranges    2    5    7
# 3: bananas    3    6   NA
# 4:  grapes    4    6    8

170

answered Oct 15 '22 10:10

Psidom

Using purrr along with dplyr might be solution to apply with multiple columns:

library(purrr)
library(dplyr)

df<-left_join(df1,df2,by="fruit")
map2_dfr(df[3],df[4],~ifelse(is.na(.x),.y,.x)) %>% 
bind_cols(df[c(1,2,5)],.)

    fruit var1 var3 var2.x
1  apples    1   NA      3
2 oranges    2    7      5
3 bananas    3   NA      6
4  grapes    4    8      6

answered Oct 15 '22 09:10

José

Related questions
                            
                                Get names of list in for loop
                            
                                rbind two data.frame preserving row order and row names
                            
                                Reading in multiple CSVs with different numbers of lines to skip at start of file
                            
                                ignore/remove NA values in read.csv
                            
                                Conditional Cumulative Sum in R
                            
                                Remove elements of a list explicitly
                            
                                R Shiny: Side by Side Checkbox
                            
                                More elegant way to return a sequence of numbers based on booleans?
                            
                                How do I map a vector of values to another vector with my own custom map in R [duplicate]
                            
                                convert simple triplet matrix(slam) to sparse matrix(Matrix) in R
                            
                                Calculate a 2D spline curve in R
                            
                                Unique rows, considering two columns, in R, without order
                            
                                data.table alternative for dplyr mutate?
                            
                                R_Extracting coordinates from SpatialPolygonsDataFrame
                            
                                highlight areas within certain x range in ggplot2
                            
                                R: Swap two variables without using a third
                            
                                plotting the means with confidence intervals with ggplot
                            
                                R unlist changes names
                            
                                How to Fit Long Text into Ggplot2 facet Titles
                            
                                Axis labels and limits with ggplot scale_x_datetime

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With