Say I have the following dataframes:
DF1 <- data.frame("A" = rep(c("A","B"), 18),
"B" = rep(c("C","D","E"), 12),
"NUM"= rep(rnorm(36,10,1)),
"TEST" = rep(NA,36))
DF2 <- data.frame("A" = rep("A",6),
"B" = rep(c("C","D"),6),
"VAL" = rep(c(1,3),3))
*Note: Each unique combination of variables A
and B
in DF2
should have a unique VAL
.
For each row, I would like to replace the NA
in TEST
with the corresponding value of VAL
in DF1
if the values in columns A
and A
match and the values in columns B
and B
match for that row. Otherwise, I'd leave TEST
as NA
. How would I do this without looping through each combination using match?
Ideally, an answer would scale to two data frames with many columns to match upon.
To merge two pandas DataFrames on multiple columns use pandas. merge() method.
It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.
By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
# this is your DF1
DF1 <- data.frame("A" = rep(c("A","B"), 18),
"B" = rep(c("C","D","E"), 12),
"NUM"= rep(rnorm(36,10,1)),
"TEST" = rep(NA,36))
#this is a DF2 i created, with unique A, B, VAL
DF2 <- data.frame("A" = rep(c("A","B"),3),
"B" = rep(c("C","D","E"),2),
"VAL" = rep(1:6))
# and this is the answer of what i assume you want
tmp <- merge(DF1,DF2, by=c("A","B"), all.x=TRUE, all.y=FALSE)
DF1[4] <- tmp[5]
As Akrun mentioned in comments, your lookup table (DF2) needs to be reduced to just its unique A/B combinations. For your current dataframe, this isn't a problem, but you will need additional rules if there are multiple possible values for the same combination. From there, the solution is easy:
DF2.u <- unique(DF2)
DF3 <- merge(DF1, DF2.u, all = T)
Note that this will produce a new dataframe with an empty TEST column (all values NA
), and a VAL column assigned from DF2. To do exactly what you wanted (replace TEST with VAL where possible), here is some slightly clunkier code:
DF1$TEST <- merge(DF1, DF2.u, all = T)$VAL
EDIT: in response to your question, you can boil down DF2 if necessary quite simple:
DF2$C <- c(1:12) #now unique() won't work
DF2.u <- unique(DF2[1:3])
A B VAL
1 A C 1
2 A D 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With