Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating distinct groups of nodes in a network

The Issue

Given the following network of nodes and edges, I would like to derive all possible groupings of nodes where all nodes within a group are connected to all other nodes within that group via an edge.

Network

In this network...

  • nodes 'B', 'C', and 'F' would be in a group as they are fully interconnected
  • 'A' would only belong in a group with itself.
  • 'D' and 'B' would be in a group together, but 'D' would not belong in the group with 'B', 'C', and 'F' because it is not connected directly to 'C' and 'F' via an edge.

In other words, the rules are as follows:

  1. All members of a group must be connected to all other members of that group directly via an edge.

  2. An object may be a member of multiple groups.

  3. No redundant groups. If a group can fit within a larger group, it is not a group. (Ex. 'B' and 'C' do not comprise a valid group on their own because they both fit within the larger group of 'B', 'C', and 'F'). An object may only be in a singular group (ex. A-A) if it belongs to no other groups.


I have represented the network above as a dataframe where each row represent pairs of nodes (x1 and x2) bound by an edge:

x1 <- c("A", "B", "B", "B", "B", "C", "C", "C", "D", "D", "D", "E", "E", "F", "F", "F")
x2 <- c("A", "B", "C", "D", "F", "B", "C", "F", "B", "D", "E", "D", "E", "B", "C", "F")

df <- data.frame(x1, x2)

Given this df, I would like to derive the following valid groups (provided in visual as well as data frame form):

enter image description here

     1    2    3    4   
1    A    B    B    D       
2   NULL  C    D    E 
3   NULL  F   NULL NULL 

**Note: the order of groups/group names is irrelevant.


What I have tried

I have attempted to loop through a list of each unique node name in column x1 of df to identify all nodes that each node is connected to. I then use this information to generate group rosters. However, these group rosters are sometimes invalidated by violating rule 1. Here is what I have thus far...

n <- nrow(as.data.frame(unique(df$x1)))

RosterGuide <- as.data.frame(matrix(nrow = n , ncol = 1)) 
RosterGuide$V1 <- seq.int(nrow(RosterGuide))
RosterGuide$Object <- (unique(df$x1))
colnames(RosterGuide) <- c("V1","Object")
groups_frame <- matrix(, ncol= length(n), nrow = length(n))

for (loopItem in 1:nrow(RosterGuide)) {

object <- subset(RosterGuide$Object, RosterGuide$V1 == loopItem)
group <- as.data.frame(subset(df$x2, df$x1 == object))

groups_frame <- cbind.fill(group, groups_frame, fill = "NULL")
}

Groups <- as.data.frame(groups_frame)
Groups <- subset(Groups, select = - c(object))
colnames(Groups) <- RosterGuide$V1

... this loop yields the data frame 'Groups'...

     1    2    3    4   5    6
1    B    D    B    B   B    A
2    C    E    D    C   C NULL
3    F NULL    E    F   D NULL
4 NULL NULL NULL NULL   F NULL

This is where I am at. You can see that group 3 violates the first rule because 'B' and 'E' are not directly connected by an edge, group 5 violates the first rule because 'F' and 'D' and 'F' and 'C' are not directly connected via an edge, and group 4 violates the third rule because it is a duplication of group 1 (I am less worried about 3rd rule violations, I can solve that one easily).

I am at a loss in trying to get from the data frame 'Groups' to the valid output I suggested above in a way that is universal to any data frame like df (2 columns, infinite rows) that describes the nodes and edges of a network of any size.

like image 408
Ben Avatar asked Apr 29 '19 20:04

Ben


1 Answers

Convert your data frame representation of the network to an igraph object. Use max_cliques to find "all the maximal cliques in an undirected graph".

library(igraph)
g <- graph_from_data_frame(df, directed = FALSE)
mc <- max_cliques(g, min = 1)
mc
# [[1]]
# + 1/6 vertex, named, from eb2aa45:
# [1] A
# 
# [[2]]
# + 2/6 vertices, named, from eb2aa45:
# [1] D E
# 
# [[3]]
# + 2/6 vertices, named, from eb2aa45:
# [1] D B
# 
# [[4]]
# + 3/6 vertices, named, from eb2aa45:
# [1] B F C

Grab the names of the vertices of the maximal cliques. Create corresponding group numbers and convert to data frame:

nm <- lapply(mc, attr, "names")
d <- data.frame(g = rep(seq_len(length(nm)), lengths(nm)), vert = unlist(nm))
d
#   g vert
# 1 1    A
# 2 2    D
# 3 2    E
# 4 3    D
# 5 3    B
# 6 4    B
# 7 4    F
# 8 4    C

simplify graph, plot it, highlight vertex groups using the list above in mark.groups. Prettify according to taste (see ?plot.igraph).

plot(simplify(g), mark.groups = nm, mark.border = "red", mark.col = NA)

enter image description here

like image 56
Henrik Avatar answered Nov 03 '22 18:11

Henrik