Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count identical row values for each pair of columns to create network graph

Tags:

r

igraph

I have data like this:

dat <- data.frame(
  music = c("classical", "jazz", "baroque", "electronic", "ambient"),
  john = c(1,1,0,1,1),
  jeff = c(1,0,0,1,0),
  jane = c(0,1,1,0,0)
)

       music john jeff jane
1  classical    1    1    0
2       jazz    1    0    1
3    baroque    0    0    1
4 electronic    1    1    0
5    ambient    1    0    0

And want to graph the overlap between the individuals on the columns - how often do they both have 1s in the same row? If I could get to this data.frame:

result <- data.frame(person1 = c("john", "john", "jeff"), person2 = c("jeff", "jane", "jane"), overlap = c(2, 1, 0))

  person1 person2 overlap
1    john    jeff       2
2    john    jane       1
3    jeff    jane       0

I could create the graph I have in mind:

library(igraph)
g <- graph.data.frame(result, directed = FALSE)
plot(g, edge.width = result$overlap * 3)

But I'm struggling to transform the data to count row-wise overlap between each pair of columns. How can I do that?

like image 302
Sam Firke Avatar asked Mar 12 '23 11:03

Sam Firke


2 Answers

Probably an easier approach is to create the adjacency matrix of the graph by taking the crossproduct. You can then read this in directly to igraph.

library(igraph)

# Take the crossproduct: assumes unique music types in each row
# otherwise aggregate terms
m <- crossprod(as.matrix(dat[-1]))

# You could remove the diagonal terms here
# although it is useful to see the sum for each individual
# You can also remove it in igraph, as below
# diag(m) <- 0

# Create graph
# The weights are stored in E(g)$weight
g <- graph_from_adjacency_matrix(m, mode="undirected", weighted = TRUE)

# Remove edge loops
g <- simplify(g)
like image 187
user20650 Avatar answered Apr 25 '23 13:04

user20650


Maybe you want to experiment with different similarity/distance measures, like Russel/Roa, Jaccard etc. I mean: 0 and 0 can be interpreted as similarity, too. Anyway, here's another approach:

library(proxy)
m <- (1-as.matrix(dist( t(dat[, -1]), method = "Russel")))*nrow(dat)
m[lower.tri(m, T)] <- NA
(res <- setNames(reshape2::melt(m, na.rm=T), c("p1", "p2", "ol")))
#     p1   p2 ol
# 4 john jeff  2
# 7 john jane  1
# 8 jeff jane  0
like image 21
lukeA Avatar answered Apr 25 '23 14:04

lukeA