Given data:
D = data.frame(f1=c("a","a","b","b","b"), f2=c("X","Y","X","Y","Z"), val=1:5)
D
f1 f2 val
1 a X 1
2 a Y 2
3 b X 3
4 b Y 4
5 b Z 5
How to crosstabulate f2
pairs so that each element of the table corresponds to the sum of val
for each pair of f2
where they share the same f1
?
# desired result (matrix diagonal could be either NA or 0):
X Y Z
X NA 10 8
Y 10 NA 9
Z 8 9 NA
For example, X
and Y
share a
, therefore (1 + 2)
and also share b
therefore (3 + 4)
. Therefore result[1, 2] = (1 + 2) + (3 + 4) = 10
. Also, result[2, 3] = 4 + 5 = 9
, etc.
It helps to think of f1
as stocks, f2
as portfolios and val
as invested holding amounts of each stock in each portfolio. The resulting table thus represents "herding" - the degree to which different portfolios tend to hold the same stock.
Is there any "smart" R
function similar to table
or sweep
suitable for the task?
I have finally found solution using cartesian self-join in data.table
package and dcast
ing the table of pairwise sums:
library(data.table)
D = data.table(f1=factor(c("a","a","b","b","b")), f2=factor(c("X","Y","X","Y","Z")), val=1:5)
setkey(D, f1)
DD = D[D, allow.cartesian=TRUE]
DD[, pair_sum := val + i.val]
DD[f2==i.f2, pair_sum := NA_real_]
dcast.data.table(DD, f2 ~ i.f2, fun.aggregate = sum, value.var="pair_sum")
# f2 X Y Z
# 1: X NA 10 8
# 2: Y 10 NA 9
# 3: Z 8 9 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With