How to calculate a table of pairwise counts from long-form data frame

Question

I have a 'long-form' data frame with columns id (the primary key) and featureCode (categorical variable). Each record has between 1 and 9 values of the categorical variable. For example:

id  featureCode
5   PPLC
5   PCLI
6   PPLC
6   PCLI
7   PPL
7   PPLC
7   PCLI
8   PPLC
9   PPLC
10  PPLC

I'd like to calculate the number of times each feature code is used with the other feature codes (the "pairwise counts" of the title). At this stage, the order each feature code is used is not important. I envisage the result would be another data frame, where the rows and columns are feature codes, and the cells are counts. For example:

      PPLC  PCLI  PPL
PPLC  0     3     1
PCLI  3     0     1
PPL   1     1     0

Unfortunately, I don't know how to perform this calculation and I've drawn a blank when searching for advice (mostly, I suspect, because I don't know the correct terminology).

mnel · Accepted Answer

Here is a data.table approach similar to @mrdwab

It will work best if featureCode is a character

library(data.table)

DT <- data.table(dat)
# convert to character
DT[, featureCode := as.character(featureCode)]
# subset those with >1 per id
DT2 <- DT[, N := .N, by = id][N>1]
# create all combinations of 2
# return as a data.table with these as columns `V1` and `V2`
# then count the numbers in each group
DT2[, rbindlist(combn(featureCode,2, 
      FUN = function(x) as.data.table(as.list(x)), simplify = F)), 
    by = id][, .N, by = list(V1,V2)]


     V1   V2 N
1: PPLC PCLI 3
2:  PPL PPLC 1
3:  PPL PCLI 1

How to calculate a table of pairwise counts from long-form data frame

Tags:

dataframe

r

Iain Dillingham

1 Answers

mnel

Recent Activity

Donate For Us

How to calculate a table of pairwise counts from long-form data frame

Tags:

dataframe

r

Iain Dillingham

1 Answers

mnel

Related questions

Recent Activity

Donate For Us