I have a data.frame
for 10 videos, and each column is a tag indicating the category of the video. For example, the data will look like this:
data <- data.frame(id=paste0("r", 1:10), A=sample(0:1,10,TRUE), B=sample(0:1,10,TRUE), C=sample(0:1,10,TRUE))
data
id A B C
1 r1 1 0 1
2 r2 0 0 0
3 r3 0 1 0
4 r4 1 1 0
5 r5 0 0 0
6 r6 1 0 1
7 r7 1 0 1
8 r8 0 1 1
9 r9 0 0 1
10 r10 1 0 0
Now I would like to form a adjacency matrix based on tags, and the value should be the number of videos with same tags. For example, cell A-C
should be 3, because r1
, r6
and r7
have the same tags. Finally, I would like an output matrix like the following:
A B C
A 5 1 3
B 1 3 1
C 3 1 5
How could I aggregate the data?
Matrix multiplication should work here
set.seed(1)
dat <- data.frame(id=paste0("r", 1:10), A=sample(0:1,10,TRUE), B=sample(0:1,10,TRUE), C=sample(0:1,10,TRUE))
mat <- as.matrix(dat[-1])
t(mat) %*% mat
EDIT
Or in a one-liner (thanks @AnandaMahto)
crossprod(as.matrix(dat[-1]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With