Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get number of same individuals for different groups

Tags:

r

data.table

sum

I have a data set with individuals (ID) that can be part of more than one group.

Example:

library(data.table)
DT <- data.table(
  ID = rep(1:5, c(3:1, 2:3)),
  Group = c("A", "B", "C", "B",
            "C", "A", "A", "C",
            "A", "B", "C")
)
DT
#     ID Group
#  1:  1     A
#  2:  1     B
#  3:  1     C
#  4:  2     B
#  5:  2     C
#  6:  3     A
#  7:  4     A
#  8:  4     C
#  9:  5     A
# 10:  5     B
# 11:  5     C

I want to know the sum of identical individuals for 2 groups.

The result should look like this:

  Group.1    Group.2    Sum
    A           B        2
    A           C        3
    B           C        3

Where Sum indicates the number of individuals the two groups have in common.

like image 916
CC89 Avatar asked Dec 19 '22 22:12

CC89


2 Answers

Here's my version:

# size-1 IDs can't contribute; skip
DT[ , if (.N > 1) 
  # simplify = FALSE returns a list;
  #   transpose turns the 3-length list of 2-length vectors
  #   into a length-2 list of 3-length vectors (efficiently)
  transpose(combn(Group, 2L, simplify = FALSE)), by = ID
  ][ , .(Sum = .N), keyby = .(Group.1 = V1, Group.2 = V2)]

With output:

#    Group.1 Group.2 Sum
# 1:       A       B   2
# 2:       A       C   3
# 3:       B       C   3
like image 152
MichaelChirico Avatar answered Dec 21 '22 13:12

MichaelChirico


As of version 1.9.8 (on CRAN 25 Nov 2016), data.table has gained the ability to do non-equi joins. So, a self non-equi join can be used:

library(data.table) # v1.9.8+
setDT(DT)[, Group:= factor(Group)]
DT[DT, on = .(ID, Group < Group), nomatch = 0L, .(ID, x.Group, i.Group)][
  , .N, by = .(x.Group, i.Group)]
   x.Group i.Group N
1:       A       B 2
2:       A       C 3
3:       B       C 3

Explanantion

The non-equi join on ID, Group < Group is a data.table version of combn() (but applied group-wise):

DT[DT, on = .(ID, Group < Group), nomatch = 0L, .(ID, x.Group, i.Group)]
   ID x.Group i.Group
1:  1       A       B
2:  1       A       C
3:  1       B       C
4:  2       B       C
5:  4       A       C
6:  5       A       B
7:  5       A       C
8:  5       B       C
like image 22
Uwe Avatar answered Dec 21 '22 12:12

Uwe