Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R create adjacency matrix according to columns from data.frame

I have a data.frame for 10 videos, and each column is a tag indicating the category of the video. For example, the data will look like this:

data <- data.frame(id=paste0("r", 1:10), A=sample(0:1,10,TRUE), B=sample(0:1,10,TRUE), C=sample(0:1,10,TRUE))
data
    id A B C
1   r1 1 0 1
2   r2 0 0 0
3   r3 0 1 0
4   r4 1 1 0
5   r5 0 0 0
6   r6 1 0 1
7   r7 1 0 1
8   r8 0 1 1
9   r9 0 0 1
10 r10 1 0 0

Now I would like to form a adjacency matrix based on tags, and the value should be the number of videos with same tags. For example, cell A-C should be 3, because r1, r6 and r7 have the same tags. Finally, I would like an output matrix like the following:

     A    B    C
A    5    1    3
B    1    3    1
C    3    1    5

How could I aggregate the data?

like image 587
Boxuan Avatar asked Nov 24 '14 23:11

Boxuan


1 Answers

Matrix multiplication should work here

set.seed(1)
dat <- data.frame(id=paste0("r", 1:10), A=sample(0:1,10,TRUE), B=sample(0:1,10,TRUE), C=sample(0:1,10,TRUE))

mat <- as.matrix(dat[-1])

t(mat) %*% mat

EDIT

Or in a one-liner (thanks @AnandaMahto)

crossprod(as.matrix(dat[-1]))
like image 120
user20650 Avatar answered Sep 24 '22 03:09

user20650