I have two data.tables. I would like to count the number of rows matching a combination of a table in another table. I have checked the data.table documentation but I have not found my answer. I am using data.table 1.9.2.
DT1 <- data.table(a=c(3,2), b=c(8,3))
DT2 <- data.table(w=c(3,3,3,2,3), x=c(8,8,8,3,7), z=c(2,6,7,2,2))
DT1
# a b
# 1: 3 8
# 2: 2 3
DT2
# w x z
# 1: 3 8 2
# 2: 3 8 6
# 3: 3 8 7
# 4: 2 3 2
# 5: 3 7 2
Now I would like to count the number of (3, 8) pairs and (2, 3) pairs in DT2.
setkey(DT2, w, x)
nrow(DT2[J(3, 8), nomatch=0])
# [1] 3 ## OK !
nrow(DT2[J(2, 3), nomatch=0])
# [1] 1 ## OK !
DT1[,count_combination_in_dt2 := nrow(DT2[J(a, b), nomatch=0])]
DT1
# a b count_combination_in_dt2
# 1: 3 8 4 ## not ok.
# 2: 2 3 4 ## not ok.
Expected result:
# a b count_combination_in_dt2
# 1: 3 8 3
# 2: 2 3 1
To find the count of unique group combinations in an R data frame, we can use count function of dplyr package along with ungroup function.
To count occurrences between columns, simply use both names, and it provides the frequency between the values of each column. This process produces a dataset of all those comparisons that can be used for further processing.
setkey(DT2, w, x)
DT2[DT1, .N, by = .EACHI]
# w x N
#1: 3 8 3
#2: 2 3 1
# In versions <= 1.9.2, use DT2[DT1, .N] instead
The above simply does the merge and counts the number of rows for each group defined by the i-expression
, thus by = .EACHI
.
You just need to add by=list(a,b)
.
DT1[,count_combination_in_dt2:=nrow(DT2[J(a,b),nomatch=0]), by=list(a,b)]
DT1
##
## a b count_combination_in_dt2
## 1: 3 8 3
## 2: 2 3 1
EDIT: Some more details: In your original version, you used DT2[DT1, nomatch=0]
(because you used all a, b
combinations. If you want to use J(a,b)
for each a, b
combination separately, you need to use the by
argument. The data.table
is then grouped by a, b
and the nrow(...)
is evaluated within each group.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With