Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overlap matrix in R

Tags:

r

I've the following data frame

id channel
1  a
1  b
1  c
2  a
2  c
3  a

I would like to create and overlap matrix. It is basically a square matrix with rows & column labels as a,b,c with each entry in that table showing how many id's are common to each channel. For example in the above example the matrix would look like

  a b c
a 3 1 2
b 1 1 1
c 2 1 2

Thanks much in advance.

like image 926
broccoli Avatar asked Jul 31 '12 18:07

broccoli


2 Answers

This should do the trick:

df <- data.frame(id=c(1,1,1,2,2,3), channel=letters[c(1,2,3,1,3,1)]) # your data

m <- table(df[[1]], df[[2]])   ## Alternatively:  m <- do.call(table, df)
t(m) %*% m
#   a b c
# a 3 1 2
# b 1 1 1
# c 2 1 2
like image 63
Josh O'Brien Avatar answered Oct 21 '22 23:10

Josh O'Brien


library(plyr)
df
  id channel
1  1       a
2  1       b
3  1       c
4  2       a
5  2       c
6  3       a
tb <- table(ddply(df, .(id), function(x) {x$id <- x$channel; expand.grid(x)}))
tb
   channel
id  a b c
  a 3 1 2
  b 1 1 1
  c 2 1 2
names(dimnames(tb)) <- NULL
tb
  a b c
a 3 1 2
b 1 1 1
c 2 1 2

Now some explanations and something about matrix tables as output of table(). There is an example in ?table

a <- letters[1:3]
(b <- sample(a))
[1] "b" "c" "a"
table(a, b)
   b
a   a b c
  a 0 1 0
  b 0 0 1
  c 1 0 0

So it matches elements by the position.. Now if we have

  id channel
  1       a
  1       b
  1       c
  2       a
  ...

Then sharing the same id could be showed by splitting the data frame by id, creating a copy of channel column and getting all the combinations of these two columns:

tbl <- expand.grid(data.frame(x = c("a","b","c"), y = c("a", "b", "c")))
tbl
  x y
1 a a
2 b a
3 c a
4 a b
5 b b
6 c b
7 a c
8 b c
9 c c
table(tbl$x, tbl$y)

    a b c
  a 1 1 1
  b 1 1 1
  c 1 1 1
like image 21
Julius Vainora Avatar answered Oct 21 '22 22:10

Julius Vainora