Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Crosstabulate and summarize factors by another shared factor

Tags:

r

Given data:

D = data.frame(f1=c("a","a","b","b","b"), f2=c("X","Y","X","Y","Z"), val=1:5)

D
  f1 f2 val
1  a  X   1
2  a  Y   2
3  b  X   3
4  b  Y   4
5  b  Z   5

How to crosstabulate f2pairs so that each element of the table corresponds to the sum of val for each pair of f2 where they share the same f1?

# desired result (matrix diagonal could be either NA or 0):
   X   Y   Z
X NA  10   8
Y 10  NA   9
Z  8   9  NA

For example, X and Y share a, therefore (1 + 2) and also share b therefore (3 + 4). Therefore result[1, 2] = (1 + 2) + (3 + 4) = 10. Also, result[2, 3] = 4 + 5 = 9, etc.

It helps to think of f1 as stocks, f2 as portfolios and val as invested holding amounts of each stock in each portfolio. The resulting table thus represents "herding" - the degree to which different portfolios tend to hold the same stock.

Is there any "smart" R function similar to table or sweep suitable for the task?

like image 466
Daniel Krizian Avatar asked Mar 17 '23 21:03

Daniel Krizian


1 Answers

I have finally found solution using cartesian self-join in data.table package and dcasting the table of pairwise sums:

library(data.table)
D = data.table(f1=factor(c("a","a","b","b","b")), f2=factor(c("X","Y","X","Y","Z")), val=1:5)

setkey(D, f1)

DD = D[D, allow.cartesian=TRUE]
DD[, pair_sum := val + i.val]
DD[f2==i.f2, pair_sum := NA_real_]
dcast.data.table(DD, f2 ~ i.f2, fun.aggregate = sum, value.var="pair_sum")
#    f2  X  Y  Z
# 1:  X NA 10  8
# 2:  Y 10 NA  9
# 3:  Z  8  9 NA
like image 138
Daniel Krizian Avatar answered Mar 20 '23 14:03

Daniel Krizian