I have two dataframes (df1, df2). I want to fill in the AGE and SEX values from df1 to df2 conditioned on having the same ID between the two. I tried several ways using for-loop and checking subject ID match between the two data frame but I failed. The result should be as in the df3. I have a huge dataset, so I want a piece of code in R that can do this easily. I would appreciate your assistance in this. Thank you. <pre class="prettyprint"><code>df1: ID AGE SEX 90901 39 0 90902 28 0 90903 40 1 df2: ID AGE SEX Conc 90901 NA NA 5 90901 NA NA 10 90901 NA NA 15 90903 NA NA 30 90903 NA NA 5 90902 NA NA 2.45 90902 NA NA 51 90902 NA NA 1 70905 NA NA 0.5 result: df3: ID AGE SEX Conc 90901 39 0 5 90901 39 0 10 90901 39 0 15 90903 40 1 30 90903 40 1 5 90902 28 1 2.45 90902 28 0 51 90902 28 0 1 70905 NA NA 0.5 </code></pre>

You could use <code>match</code> with <code>lapply</code> for this. If we iterate <code>[[</code> with matching on the <code>ID</code> column of each of the original data sets over a vector of names, we can get the desired result. <pre class="prettyprint"><code>nm <- c("AGE", "SEX") df2[nm] <- lapply(nm, function(x) df1[[x]][match(df2$ID, df1$ID)]) df2 # ID AGE SEX Conc # 1 90901 39 0 5.00 # 2 90901 39 0 10.00 # 3 90901 39 0 15.00 # 4 90903 40 1 30.00 # 5 90903 40 1 5.00 # 6 90902 28 0 2.45 # 7 90902 28 0 51.00 # 8 90902 28 0 1.00 # 9 70905 NA NA 0.50 </code></pre> Note that this is also quite a bit faster than <code>merge</code>.

Try <code>merge(df1, df2, by = "id")</code>. This will merge your two data frames together. If your example is a good representation of your actual data, then you might want to go ahead and drop the age and sex columns from df2 before you merge. <pre class="prettyprint"><code>df2$AGE <- NULL df2$SEX <- NULL df3 <- merge(df1, df2, by = "id") </code></pre> If you need to keep rows from df2 even when you don't have a matching id in df1, then you do this: <pre class="prettyprint"><code>df2 <- subset(df2, select = -c(AGE,SEX) ) df3 <- merge(df1, df2, by = "id", all.y = TRUE) </code></pre> You can learn more about <code>merge</code> (or any r function) by typing <code>?merge()</code> in your r console.

filling in columns with matching IDs from two dataframes in R

Tags:

r

I have two dataframes (df1, df2). I want to fill in the AGE and SEX values from df1 to df2 conditioned on having the same ID between the two. I tried several ways using for-loop and checking subject ID match between the two data frame but I failed. The result should be as in the df3. I have a huge dataset, so I want a piece of code in R that can do this easily. I would appreciate your assistance in this. Thank you.

df1:
ID    AGE   SEX
90901   39  0
90902   28  0
90903   40  1

df2:
ID     AGE  SEX  Conc
90901   NA  NA    5
90901   NA  NA    10
90901   NA  NA    15
90903   NA  NA    30
90903   NA  NA    5
90902   NA  NA    2.45
90902   NA  NA    51
90902   NA  NA    1
70905   NA  NA    0.5

result:
df3:
ID     AGE  SEX  Conc
90901   39  0     5
90901   39  0     10
90901   39  0     15
90903   40  1    30
90903   40  1    5
90902   28  1    2.45
90902   28  0    51
90902   28  0     1
70905   NA  NA    0.5

387

asked Aug 28 '14 01:08

Amer

2 Answers

You could use match with lapply for this. If we iterate [[ with matching on the ID column of each of the original data sets over a vector of names, we can get the desired result.

nm <- c("AGE", "SEX")
df2[nm] <- lapply(nm, function(x) df1[[x]][match(df2$ID, df1$ID)])
df2
#      ID AGE SEX  Conc
# 1 90901  39   0  5.00
# 2 90901  39   0 10.00
# 3 90901  39   0 15.00
# 4 90903  40   1 30.00
# 5 90903  40   1  5.00
# 6 90902  28   0  2.45
# 7 90902  28   0 51.00
# 8 90902  28   0  1.00
# 9 70905  NA  NA  0.50

Note that this is also quite a bit faster than merge.

191

answered Sep 25 '22 21:09

Rich Scriven

Try merge(df1, df2, by = "id"). This will merge your two data frames together. If your example is a good representation of your actual data, then you might want to go ahead and drop the age and sex columns from df2 before you merge.

df2$AGE <- NULL
df2$SEX <- NULL
df3 <- merge(df1, df2, by = "id")

If you need to keep rows from df2 even when you don't have a matching id in df1, then you do this:

df2 <- subset(df2, select = -c(AGE,SEX) )
df3 <- merge(df1, df2, by = "id", all.y = TRUE)

You can learn more about merge (or any r function) by typing ?merge() in your r console.

answered Sep 25 '22 21:09

jed

Related questions
                            
                                Efficiently center a large matrix in R
                            
                                repeat ggplot using different data without typing out the whole code
                            
                                How to convert a vector of strings to a dataframe or matrix
                            
                                R: Generate histogram from counts of data
                            
                                Extracting subset of the data frame in R
                            
                                Succinctly assign names and values simultaneously
                            
                                How can I add rows to an R data frame every other row?
                            
                                Is there a package to determine gender from English first names? [closed]
                            
                                How to install a package not located on CRAN repository?
                            
                                Line plot of multiple variables in R
                            
                                Convert columns to rows keeping the name of the column
                            
                                Joining Lists in R
                            
                                Extracting elevation from website for lat/lon points in Australia, using R
                            
                                How to keep a column of dataframe as dataframe
                            
                                R - add centroids to scatter plot
                            
                                How to select rows in a table whose row.names match any element from a character vector?
                            
                                Add line break in print statement in R [duplicate]
                            
                                Extract numbers from strings including '|'
                            
                                highlight weekends using ggplot?
                            
                                Convert binary vector to decimal

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With