Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filling in columns with matching IDs from two dataframes in R

Tags:

r

I have two dataframes (df1, df2). I want to fill in the AGE and SEX values from df1 to df2 conditioned on having the same ID between the two. I tried several ways using for-loop and checking subject ID match between the two data frame but I failed. The result should be as in the df3. I have a huge dataset, so I want a piece of code in R that can do this easily. I would appreciate your assistance in this. Thank you.

df1:
ID    AGE   SEX
90901   39  0
90902   28  0
90903   40  1

df2:
ID     AGE  SEX  Conc
90901   NA  NA    5
90901   NA  NA    10
90901   NA  NA    15
90903   NA  NA    30
90903   NA  NA    5
90902   NA  NA    2.45
90902   NA  NA    51
90902   NA  NA    1
70905   NA  NA    0.5

result:
df3:
ID     AGE  SEX  Conc
90901   39  0     5
90901   39  0     10
90901   39  0     15
90903   40  1    30
90903   40  1    5
90902   28  1    2.45
90902   28  0    51
90902   28  0     1
70905   NA  NA    0.5
like image 387
Amer Avatar asked Aug 28 '14 01:08

Amer


People also ask

How do I combine two data frames in R with similar columns?

To combine two data frames with same columns in R language, call rbind() function, and pass the two data frames, as arguments. rbind() function returns the resulting data frame created from concatenating the given two data frames. For rbind() function to combine the given data frames, the column names must match.

How do I combine two datasets in R with different columns?

Method 1 : Using plyr package rbind. fill() method in R is an enhancement of the rbind() method in base R, is used to combine data frames with different columns. The column names are number may be different in the input data frames. Missing columns of the corresponding data frames are filled with NA.


2 Answers

You could use match with lapply for this. If we iterate [[ with matching on the ID column of each of the original data sets over a vector of names, we can get the desired result.

nm <- c("AGE", "SEX")
df2[nm] <- lapply(nm, function(x) df1[[x]][match(df2$ID, df1$ID)])
df2
#      ID AGE SEX  Conc
# 1 90901  39   0  5.00
# 2 90901  39   0 10.00
# 3 90901  39   0 15.00
# 4 90903  40   1 30.00
# 5 90903  40   1  5.00
# 6 90902  28   0  2.45
# 7 90902  28   0 51.00
# 8 90902  28   0  1.00
# 9 70905  NA  NA  0.50

Note that this is also quite a bit faster than merge.

like image 191
Rich Scriven Avatar answered Sep 25 '22 21:09

Rich Scriven


Try merge(df1, df2, by = "id"). This will merge your two data frames together. If your example is a good representation of your actual data, then you might want to go ahead and drop the age and sex columns from df2 before you merge.

df2$AGE <- NULL
df2$SEX <- NULL
df3 <- merge(df1, df2, by = "id")

If you need to keep rows from df2 even when you don't have a matching id in df1, then you do this:

df2 <- subset(df2, select = -c(AGE,SEX) )
df3 <- merge(df1, df2, by = "id", all.y = TRUE)

You can learn more about merge (or any r function) by typing ?merge() in your r console.

like image 24
jed Avatar answered Sep 25 '22 21:09

jed