Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Updating a data frame with another data frame

Let's say our initial data frame looks like this:

df1 = data.frame(Index=c(1:6),A=c(1:6),B=c(1,2,3,NA,NA,NA),C=c(1,2,3,NA,NA,NA))

> df1
  Index A  B  C
1     1 1  1  1
2     2 2  2  2
3     3 3  3  3
4     4 4 NA NA
5     5 5 NA NA
6     6 6 NA NA

Another data frame contains new information for col B and C

df2 = data.frame(Index=c(4,5,6),B=c(4,4,4),C=c(5,5,5))

> df2
  Index B C
1     4 4 5
2     5 4 5
3     6 4 5

How can you update the missing values in df1 so it looks like this:

  Index A B C
1     1 1 1 1
2     2 2 2 2
3     3 3 3 3
4     4 4 4 5
5     5 5 4 5
6     6 6 4 5

My attempt:

library(dplyr)

> full_join(df1,df2)
Joining by: c("Index", "B", "C")
  Index  A  B  C
1     1  1  1  1
2     2  2  2  2
3     3  3  3  3
4     4  4 NA NA
5     5  5 NA NA
6     6  6 NA NA
7     4 NA  4  5
8     5 NA  4  5
9     6 NA  4  5

Which as you can see has created duplicate rows for the 4,5,6 index instead of replacing the NA values.

Any help would be greatly appreciated!

like image 339
Zyferion Avatar asked Dec 05 '22 00:12

Zyferion


2 Answers

merge then aggregate:

aggregate(. ~ Index, data=merge(df1, df2, all=TRUE), na.omit, na.action=na.pass )

#  Index B C A
#1     1 1 1 1
#2     2 2 2 2
#3     3 3 3 3
#4     4 4 5 4
#5     5 4 5 5
#6     6 4 5 6

Or in dplyr speak:

df1 %>% 
    full_join(df2) %>%
    group_by(Index) %>%
    summarise_each(funs(na.omit))

#Joining by: c("Index", "B", "C")
#Source: local data frame [6 x 4]
#
#  Index     A     B     C
#  (dbl) (int) (dbl) (dbl)
#1     1     1     1     1
#2     2     2     2     2
#3     3     3     3     3
#4     4     4     4     5
#5     5     5     4     5
#6     6     6     4     5
like image 87
thelatemail Avatar answered Dec 06 '22 13:12

thelatemail


We can use join from data.table. Convert the 'data.frame' to 'data.table' (setDT(df1), join on with 'df1' using "Index" and assign (:=), the values in 'B' and 'C' with 'i.B' and 'i.C'.

library(data.table)
setDT(df1)[df2, c('B', 'C') := .(i.B, i.C), on = "Index"]
df1
#   Index A B C
#1:     1 1 1 1
#2:     2 2 2 2
#3:     3 3 3 3
#4:     4 4 4 5
#5:     5 5 4 5
#6:     6 6 4 5
like image 21
akrun Avatar answered Dec 06 '22 14:12

akrun