I have a matrix with values 0 or 1 and I would like to obtain a list of groups of adjacent 1's.
For example, the matrix
mat = rbind(c(1,0,0,0,0),
c(1,0,0,1,0),
c(0,0,1,0,0),
c(0,0,0,0,0),
c(1,1,1,1,1))
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 1 0 0 1 0
[3,] 0 0 1 0 0
[4,] 0 0 0 0 0
[5,] 1 1 1 1 1
should return the following 4 connected components:
C1 = {(1,1);(2,1)}
C2 = {(2,4)}
C3 = {(3,3)}
C4 = {(5,1);(5,2);(5,3);(5,4);(5,5)}
Does anybody has an idea of how to do it fast in R? My real matrix is indeed rather large, like 2000x2000 (but I expect that the number of connected components to be reasonably small, i.e. 200).
Breadth-first search (BFS) is used for finding all connected components in a graph (finding all nodes within one connected component). The set of nodes reached by a BFS are the largest connected component containing the start node.
We can find all strongly connected components in O(V+E) time using Kosaraju's algorithm. Following is detailed Kosaraju's algorithm. Create an empty stack 'S' and do DFS traversal of a graph. In DFS traversal, after calling recursive DFS for adjacent vertices of a vertex, push the vertex to stack.
Connected Component DefinitionA set of nodes forms a connected component in an undirected graph if any node from the set of nodes can reach any other node by traversing edges. The main point here is reachability. In connected components, all the nodes are always reachable from each other.
Connected is usually associated with undirected graphs (two way edges): there is a path between every two nodes. Strongly connected is usually associated with directed graphs (one way edges): there is a route between every two nodes.
With the update, you can turn your binary matrix into a raster object and use the clumps function. Then it is just data management to return the exact format you want. Example below:
library(igraph)
library(raster)
mat = rbind(c(1,0,0,0,0),
c(1,0,0,1,0),
c(0,0,1,0,0),
c(0,0,0,0,0),
c(1,1,1,1,1))
Rmat <- raster(mat)
Clumps <- as.matrix(clump(Rmat, directions=4))
#turn the clumps into a list
tot <- max(Clumps, na.rm=TRUE)
res <- vector("list",tot)
for (i in 1:tot){
res[i] <- list(which(Clumps == i, arr.ind = TRUE))
}
Which then res
prints out at the console:
> res
[[1]]
row col
[1,] 1 1
[2,] 2 1
[[2]]
row col
[1,] 2 4
[[3]]
row col
[1,] 3 3
[[4]]
row col
[1,] 5 1
[2,] 5 2
[3,] 5 3
[4,] 5 4
[5,] 5 5
I wouldn't be surprised if there is a better way to go from the raster object to your end goal though. Again a 2000 by 2000 matrix should not be a big deal for this.
Old (wrong answer) but should be useful for people who want connected components of a graph.
You can use the igraph package to turn your adjacency matrix into a network and return the components. Your example graph is one component, so I removed one edge for illustration.
library(igraph)
mat = rbind(c(1,0,0,0,0),
c(1,0,0,1,0),
c(0,0,1,0,0),
c(0,0,0,0,0),
c(1,1,1,1,1))
g <- graph.adjacency(mat) %>% delete_edges("5|3")
plot(g)
clu <- components(g)
groups(clu)
The final line then returns at the prompt:
> groups(clu)
$`1`
[1] 1 2 4 5
$`2`
[1] 3
My experience with this algorithm it is pretty fast - so I don't think 2,000 by 2,000 will be a problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With