Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtaining connected components in R

I have a matrix with values 0 or 1 and I would like to obtain a list of groups of adjacent 1's.

For example, the matrix

mat = rbind(c(1,0,0,0,0),
            c(1,0,0,1,0),
            c(0,0,1,0,0),
            c(0,0,0,0,0),
            c(1,1,1,1,1))

> mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    1    0    0    1    0
[3,]    0    0    1    0    0
[4,]    0    0    0    0    0
[5,]    1    1    1    1    1

should return the following 4 connected components:

C1 = {(1,1);(2,1)}

C2 = {(2,4)}

C3 = {(3,3)}

C4 = {(5,1);(5,2);(5,3);(5,4);(5,5)}

Does anybody has an idea of how to do it fast in R? My real matrix is indeed rather large, like 2000x2000 (but I expect that the number of connected components to be reasonably small, i.e. 200).

like image 351
Instanton Avatar asked Mar 03 '16 12:03

Instanton


People also ask

Can we find connected components using BFS?

Breadth-first search (BFS) is used for finding all connected components in a graph (finding all nodes within one connected component). The set of nodes reached by a BFS are the largest connected component containing the start node.

How do you find strongly connected?

We can find all strongly connected components in O(V+E) time using Kosaraju's algorithm. Following is detailed Kosaraju's algorithm. Create an empty stack 'S' and do DFS traversal of a graph. In DFS traversal, after calling recursive DFS for adjacent vertices of a vertex, push the vertex to stack.

How are components connected?

Connected Component DefinitionA set of nodes forms a connected component in an undirected graph if any node from the set of nodes can reach any other node by traversing edges. The main point here is reachability. In connected components, all the nodes are always reachable from each other.

What is the difference between connected and strongly connected?

Connected is usually associated with undirected graphs (two way edges): there is a path between every two nodes. Strongly connected is usually associated with directed graphs (one way edges): there is a route between every two nodes.


1 Answers

With the update, you can turn your binary matrix into a raster object and use the clumps function. Then it is just data management to return the exact format you want. Example below:

library(igraph)
library(raster)

mat = rbind(c(1,0,0,0,0),
            c(1,0,0,1,0),
            c(0,0,1,0,0),
            c(0,0,0,0,0),
            c(1,1,1,1,1))
Rmat <- raster(mat)
Clumps <- as.matrix(clump(Rmat, directions=4))

#turn the clumps into a list
tot <- max(Clumps, na.rm=TRUE)
res <- vector("list",tot)
for (i in 1:tot){
  res[i] <- list(which(Clumps == i, arr.ind = TRUE))
}

Which then res prints out at the console:

> res
[[1]]
     row col
[1,]   1   1
[2,]   2   1

[[2]]
     row col
[1,]   2   4

[[3]]
     row col
[1,]   3   3

[[4]]
     row col
[1,]   5   1
[2,]   5   2
[3,]   5   3
[4,]   5   4
[5,]   5   5

I wouldn't be surprised if there is a better way to go from the raster object to your end goal though. Again a 2000 by 2000 matrix should not be a big deal for this.


Old (wrong answer) but should be useful for people who want connected components of a graph.

You can use the igraph package to turn your adjacency matrix into a network and return the components. Your example graph is one component, so I removed one edge for illustration.

library(igraph)
mat = rbind(c(1,0,0,0,0),
            c(1,0,0,1,0),
            c(0,0,1,0,0),
            c(0,0,0,0,0),
            c(1,1,1,1,1))
g  <- graph.adjacency(mat) %>% delete_edges("5|3")
plot(g)
clu <- components(g)
groups(clu)

The final line then returns at the prompt:

> groups(clu)
$`1`
[1] 1 2 4 5

$`2`
[1] 3

My experience with this algorithm it is pretty fast - so I don't think 2,000 by 2,000 will be a problem.

like image 55
Andy W Avatar answered Oct 20 '22 16:10

Andy W