I've been searching for a solution and have been experimenting, but I can't seem to perform what I should be a simple task.
I have two data frames formatted similar to the below toy examples
DF1 = data.frame(A=c("cats","dogs",NA,"dogs"), B=c("kittens","puppies","kittens",NA), C=c(88,99,101,110))
A B C
1 cats kittens 88
2 dogs puppies 99
3 NA kittens 101
4 dogs NA 110
DF2 = data.frame(D=c(1,2), A=c("cats","dogs"), B=c("kittens","puppies"))
D A B
1 1 cats kittens
2 2 dogs puppies
I wish to merge the two data sets such that the output is:
A B C D
1 cats kittens 88 1
2 dogs puppies 99 2
3 dogs NA 110 2
4 NA kittens 101 1
In other words, any rows with labels A=="cats" or B=="kittens" will be mapped to 1 in the column D, any rows with A=="dogs" or B=="puppies" will be mapped to 2.
I have used the command
merge(DF1, DF2, by=c("A","B"), all.x=TRUE)
However this not match rows 3 and 4 correctly, only rows 1 and 2. I get the output
A B C D
1 cats kittens 88 1
2 dogs puppies 99 2
3 dogs NA 110 NA
4 NA kittens 101 NA
Please note the actual datasets I'm working with are very long. In reality DF1 is over 1,000,000 rows and DF2 is over 300,000 rows thousands of rows each, so a solution that could be scaled is what I really need.
Perhaps you can try something along these lines:
temp <- merge(DF1, DF2, by=c("A","B"), all.x=TRUE)
within(temp, {
M1 <- c("cats", "kittens")
D <- ifelse(A %in% M1 | B %in% M1, 1, 2)
rm(M1)
})
# A B C D
# 1 cats kittens 88 1
# 2 dogs puppies 99 2
# 3 dogs <NA> 110 2
# 4 <NA> kittens 101 1
You can nest ifelse
statements if you need more than just these two options.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With