Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R dataframe: how to "populate" missing data in df1 using df2

I am trying to populate the missing values of df1 with df2. Whenever there is a valid value for the same cell in both df, I need to keep the value as in df1. If there is a column in df2 that is not present in df1, this new column (z) has to be added to df1.

This would be a simple example:

id <- c (1, 2, 3, 4, 5)
x <- c (10, NA, 20, 50, 70)
y <- c (3, 5, NA, 6, 9)
df1 <- data.frame(id, x, y)

id <- c ( 2, 3, 5)
x <- c (10, NA, NA)
z <- c (NA, 6, 7)
df2 <- data.frame(id, x, z)

I would like to obtain "df3":

id  x y  z
1  1 10 3 NA
2  2 10 5 NA
3  3 20 6  6
4  4 50 6 NA
5  5 70 9  7

I tried several "merge" options that didn't work.

like image 569
Jaume Avatar asked Sep 17 '25 03:09

Jaume


1 Answers

A 'merge' option after several extract and replace steps could be

idx <- is.na(df1[df2$id,])
df1[df2$id,][idx] <- df2[idx]
out <- merge(df1, df2[, c("id", "z")], by = "id", all.x = TRUE)

Result

out
#  id  x y  z
#1  1 10 3 NA
#2  2 10 5 NA
#3  3 20 6  6
#4  4 50 6 NA
#5  5 70 9  7
like image 50
markus Avatar answered Sep 18 '25 18:09

markus