Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform "countif" for several rows in data frame?

I have two data frames, A and B. In B I have two key columns and for each row in B I need to count the number of rows in A matching those keys.

I know how to solve the problem using a for-loop but it's taking forever and i was wondering if there is a smarter way to do it. I am still quite new to R so you'll have to forgive me if I'm missing some obvious solution.

The data frames have the following strucuture. Of course, in reality the data frames are much larger.

A <- data.frame(c(1, 2, 1), c(2, 1, 2), c("alpha", "bravo", "charlie")) 
colnames(A) <- c("key1", "key2", "value")

B <- data.frame(c(1, 2, 3), c(2, 1, 3), NA)
colnames(B) <- c("key1", "key2", "count")

I used the following for-loop and got the correct result.

for (i in 1:nrow(B)) {
  B$count[i] <- sum(A$key1 == B$key1[i] & A$key2 == B$key2[i], na.rm = TRUE)
}

However, the code took a quite while to run and i suspect there is a better way to do this. I would appreciate any help!

like image 836
erikfjonsson Avatar asked Dec 23 '22 01:12

erikfjonsson


1 Answers

An option in base R (similar to @Sotos tidyverse option)

aggregate(cbind(count = !is.na(value)) ~ key1 + key2, merge(A, B, all = TRUE), sum)
#    key1 key2 count
#1    2    1     1
#2    1    2     2
#3    3    3     0

Or with data.table

library(data.table)
setDT(A)[B, .(count = sum(!is.na(value))), on = .(key1, key2), by = .EACHI]
#   key1 key2 count
#1:    1    2     2
#2:    2    1     1
#3:    3    3     0
like image 99
akrun Avatar answered Dec 30 '22 21:12

akrun