Say I have the following dataframes: <pre class="prettyprint"><code>DF1 <- data.frame("A" = rep(c("A","B"), 18), "B" = rep(c("C","D","E"), 12), "NUM"= rep(rnorm(36,10,1)), "TEST" = rep(NA,36)) DF2 <- data.frame("A" = rep("A",6), "B" = rep(c("C","D"),6), "VAL" = rep(c(1,3),3)) </code></pre> *Note: Each unique combination of variables <code>A</code> and <code>B</code> in <code>DF2</code> should have a unique <code>VAL</code>. For each row, I would like to replace the <code>NA</code> in <code>TEST</code> with the corresponding value of <code>VAL</code> in <code>DF1</code> if the values in columns <code>A</code> and <code>A</code> match and the values in columns <code>B</code> and <code>B</code> match for that row. Otherwise, I'd leave <code>TEST</code> as <code>NA</code>. How would I do this without looping through each combination using match? Ideally, an answer would scale to two data frames with many columns to match upon.

As Akrun mentioned in comments, your lookup table (DF2) needs to be reduced to just its unique A/B combinations. For your current dataframe, this isn't a problem, but you will need additional rules if there are multiple possible values for the same combination. From there, the solution is easy: <pre class="prettyprint"><code>DF2.u <- unique(DF2) DF3 <- merge(DF1, DF2.u, all = T) </code></pre> Note that this will produce a new dataframe with an empty TEST column (all values <code>NA</code>), and a VAL column assigned from DF2. To do exactly what you wanted (replace TEST with VAL where possible), here is some slightly clunkier code: <pre class="prettyprint"><code>DF1$TEST <- merge(DF1, DF2.u, all = T)$VAL </code></pre> EDIT: in response to your question, you can boil down DF2 if necessary quite simple: <pre class="prettyprint"><code>DF2$C <- c(1:12) #now unique() won't work DF2.u <- unique(DF2[1:3]) A B VAL 1 A C 1 2 A D 3 </code></pre>

How to merge two dataframes using multiple columns as key?

Tags:

merge

dataframe

r

compound-key

Say I have the following dataframes:

DF1 <- data.frame("A" = rep(c("A","B"), 18),
                  "B" = rep(c("C","D","E"), 12),
                  "NUM"= rep(rnorm(36,10,1)),
                  "TEST" = rep(NA,36))

DF2 <- data.frame("A" = rep("A",6),
                  "B" = rep(c("C","D"),6),
                  "VAL" = rep(c(1,3),3))

*Note: Each unique combination of variables A and B in DF2 should have a unique VAL.

For each row, I would like to replace the NA in TEST with the corresponding value of VAL in DF1 if the values in columns A and A match and the values in columns B and B match for that row. Otherwise, I'd leave TEST as NA. How would I do this without looping through each combination using match?

Ideally, an answer would scale to two data frames with many columns to match upon.

374

asked Mar 23 '15 14:03

goldisfine

2 Answers

# this is your DF1    
DF1 <- data.frame("A" = rep(c("A","B"), 18),
                      "B" = rep(c("C","D","E"), 12),
                      "NUM"= rep(rnorm(36,10,1)),
                      "TEST" = rep(NA,36))

#this is a DF2 i created, with unique A, B, VAL
DF2 <- data.frame("A" = rep(c("A","B"),3),
                  "B" = rep(c("C","D","E"),2),
                  "VAL" = rep(1:6))

# and this is the answer of what i assume you want      
tmp <- merge(DF1,DF2, by=c("A","B"), all.x=TRUE, all.y=FALSE)
DF1[4] <- tmp[5]

190

answered Sep 23 '22 15:09

RHA

As Akrun mentioned in comments, your lookup table (DF2) needs to be reduced to just its unique A/B combinations. For your current dataframe, this isn't a problem, but you will need additional rules if there are multiple possible values for the same combination. From there, the solution is easy:

DF2.u <- unique(DF2)
DF3 <- merge(DF1, DF2.u, all = T)

Note that this will produce a new dataframe with an empty TEST column (all values NA), and a VAL column assigned from DF2. To do exactly what you wanted (replace TEST with VAL where possible), here is some slightly clunkier code:

DF1$TEST <- merge(DF1, DF2.u, all = T)$VAL

EDIT: in response to your question, you can boil down DF2 if necessary quite simple:

DF2$C <- c(1:12) #now unique() won't work
DF2.u <- unique(DF2[1:3])

 A B VAL
1 A C   1
2 A D   3

answered Sep 21 '22 15:09

Joe

Related questions
                            
                                Plot the observed and fitted values from a linear regression using xyplot() from the lattice package
                            
                                Counting variables in a formula
                            
                                `rowname`-ing a list of matrices
                            
                                package ‘diamonds’ is not available (for R version 3.0.0) [duplicate]
                            
                                Need the filename of the Rmd when knitr runs
                            
                                Fill Geospatial polygons with pattern - R
                            
                                remove all words that start with "@" from a string
                            
                                Error: No Such Column using SQLDF
                            
                                How to edit colnames in R?
                            
                                How can I plot 3D function in r? [duplicate]
                            
                                Rolling Standard Deviation in a Matrix in R
                            
                                How to measure area between 2 distribution curves in R / ggplot2
                            
                                Using the result of summarise (dplyr) to mutate the original dataframe
                            
                                regex for preserving case pattern, capitalization
                            
                                Sleeping shinyapp on shinyapps.io
                            
                                How to match data from two tables with same primary key in R
                            
                                How can I write special characters in RMarkdown latex documents?
                            
                                Difference between runif and sample in R?
                            
                                How exactly are outliers removed in R boxplot and how can the same outliers be removed for further calculation (e.g. mean)?
                            
                                tm custom removePunctuation except hashtag

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With