I have two dataframes (df1, df2). I want to fill in the AGE and SEX values from df1 to df2 conditioned on having the same ID between the two. I tried several ways using for-loop and checking subject ID match between the two data frame but I failed. The result should be as in the df3. I have a huge dataset, so I want a piece of code in R that can do this easily. I would appreciate your assistance in this. Thank you.
df1:
ID AGE SEX
90901 39 0
90902 28 0
90903 40 1
df2:
ID AGE SEX Conc
90901 NA NA 5
90901 NA NA 10
90901 NA NA 15
90903 NA NA 30
90903 NA NA 5
90902 NA NA 2.45
90902 NA NA 51
90902 NA NA 1
70905 NA NA 0.5
result:
df3:
ID AGE SEX Conc
90901 39 0 5
90901 39 0 10
90901 39 0 15
90903 40 1 30
90903 40 1 5
90902 28 1 2.45
90902 28 0 51
90902 28 0 1
70905 NA NA 0.5
To combine two data frames with same columns in R language, call rbind() function, and pass the two data frames, as arguments. rbind() function returns the resulting data frame created from concatenating the given two data frames. For rbind() function to combine the given data frames, the column names must match.
Method 1 : Using plyr package rbind. fill() method in R is an enhancement of the rbind() method in base R, is used to combine data frames with different columns. The column names are number may be different in the input data frames. Missing columns of the corresponding data frames are filled with NA.
You could use match
with lapply
for this. If we iterate [[
with matching on the ID
column of each of the original data sets over a vector of names, we can get the desired result.
nm <- c("AGE", "SEX")
df2[nm] <- lapply(nm, function(x) df1[[x]][match(df2$ID, df1$ID)])
df2
# ID AGE SEX Conc
# 1 90901 39 0 5.00
# 2 90901 39 0 10.00
# 3 90901 39 0 15.00
# 4 90903 40 1 30.00
# 5 90903 40 1 5.00
# 6 90902 28 0 2.45
# 7 90902 28 0 51.00
# 8 90902 28 0 1.00
# 9 70905 NA NA 0.50
Note that this is also quite a bit faster than merge
.
Try merge(df1, df2, by = "id")
. This will merge your two data frames together. If your example is a good representation of your actual data, then you might want to go ahead and drop the age and sex columns from df2 before you merge.
df2$AGE <- NULL
df2$SEX <- NULL
df3 <- merge(df1, df2, by = "id")
If you need to keep rows from df2 even when you don't have a matching id in df1, then you do this:
df2 <- subset(df2, select = -c(AGE,SEX) )
df3 <- merge(df1, df2, by = "id", all.y = TRUE)
You can learn more about merge
(or any r function) by typing ?merge()
in your r console.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With