I have two data frames, A and B. In B I have two key columns and for each row in B I need to count the number of rows in A matching those keys.
I know how to solve the problem using a for-loop but it's taking forever and i was wondering if there is a smarter way to do it. I am still quite new to R so you'll have to forgive me if I'm missing some obvious solution.
The data frames have the following strucuture. Of course, in reality the data frames are much larger.
A <- data.frame(c(1, 2, 1), c(2, 1, 2), c("alpha", "bravo", "charlie"))
colnames(A) <- c("key1", "key2", "value")
B <- data.frame(c(1, 2, 3), c(2, 1, 3), NA)
colnames(B) <- c("key1", "key2", "count")
I used the following for-loop and got the correct result.
for (i in 1:nrow(B)) {
B$count[i] <- sum(A$key1 == B$key1[i] & A$key2 == B$key2[i], na.rm = TRUE)
}
However, the code took a quite while to run and i suspect there is a better way to do this. I would appreciate any help!
An option in base R
(similar to @Sotos tidyverse option)
aggregate(cbind(count = !is.na(value)) ~ key1 + key2, merge(A, B, all = TRUE), sum)
# key1 key2 count
#1 2 1 1
#2 1 2 2
#3 3 3 0
Or with data.table
library(data.table)
setDT(A)[B, .(count = sum(!is.na(value))), on = .(key1, key2), by = .EACHI]
# key1 key2 count
#1: 1 2 2
#2: 2 1 1
#3: 3 3 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With