I have data like this:
dat <- data.frame(
music = c("classical", "jazz", "baroque", "electronic", "ambient"),
john = c(1,1,0,1,1),
jeff = c(1,0,0,1,0),
jane = c(0,1,1,0,0)
)
music john jeff jane
1 classical 1 1 0
2 jazz 1 0 1
3 baroque 0 0 1
4 electronic 1 1 0
5 ambient 1 0 0
And want to graph the overlap between the individuals on the columns - how often do they both have 1s in the same row? If I could get to this data.frame
:
result <- data.frame(person1 = c("john", "john", "jeff"), person2 = c("jeff", "jane", "jane"), overlap = c(2, 1, 0))
person1 person2 overlap
1 john jeff 2
2 john jane 1
3 jeff jane 0
I could create the graph I have in mind:
library(igraph)
g <- graph.data.frame(result, directed = FALSE)
plot(g, edge.width = result$overlap * 3)
But I'm struggling to transform the data to count row-wise overlap between each pair of columns. How can I do that?
Probably an easier approach is to create the adjacency matrix of the graph by taking the crossproduct. You can then read this in directly to igraph.
library(igraph)
# Take the crossproduct: assumes unique music types in each row
# otherwise aggregate terms
m <- crossprod(as.matrix(dat[-1]))
# You could remove the diagonal terms here
# although it is useful to see the sum for each individual
# You can also remove it in igraph, as below
# diag(m) <- 0
# Create graph
# The weights are stored in E(g)$weight
g <- graph_from_adjacency_matrix(m, mode="undirected", weighted = TRUE)
# Remove edge loops
g <- simplify(g)
Maybe you want to experiment with different similarity/distance measures, like Russel/Roa, Jaccard etc. I mean: 0 and 0 can be interpreted as similarity, too. Anyway, here's another approach:
library(proxy)
m <- (1-as.matrix(dist( t(dat[, -1]), method = "Russel")))*nrow(dat)
m[lower.tri(m, T)] <- NA
(res <- setNames(reshape2::melt(m, na.rm=T), c("p1", "p2", "ol")))
# p1 p2 ol
# 4 john jeff 2
# 7 john jane 1
# 8 jeff jane 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With