I have a matrix that is populated with discrete elements, and I need to cluster them into intact groups. So, for example, take this matrix:
[A B B C A]
[A A B A A]
[A B B C C]
[A A A A A]
There would be two separate clusters for A, two separate clusters for C, and one cluster for B.
The output I'm looking for would ideally assign a unique ID to each clister, something like this:
[1 2 2 3 4]
[1 1 2 4 4]
[1 2 2 5 5]
[1 1 1 1 1]
Right now I have an R code that does this recursively by just iteratively checking nearest neighbor, but it quickly overflows when the matrix gets large (i.e., 100x100).
Is there a built in function in R that can do this? I looked into raster and image processing, but no luck. I'm convinced it must be out there.
Thanks!
You could approach this by building a lattice graph representing your matrix, where edges are only retained if the vertices have the same type:
# Build initial matrix and lattice graph
library(igraph)
mat <- matrix(c(1, 1, 1, 1, 2, 1, 2, 1, 2, 2, 2, 1, 3, 1, 3, 1, 1, 1, 3, 1), nrow=4)
labels <- as.vector(mat)
g <- graph.lattice(dim(mat))
lyt <- layout.auto(g)
# Remove edges between elements of different types
edgelist <- get.edgelist(g)
retain <- labels[edgelist[,1]] == labels[edgelist[,2]]
g <- delete.edges(g, E(g)[!retain])
# Take a look at what we have
plot(g, layout=lyt)
Vertices are numbered going down columns. It's easy to see that all we need to do is grab the components of this graph:
matrix(clusters(g)$membership, nrow=nrow(mat))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 2 3 4
# [2,] 1 1 2 4 4
# [3,] 1 2 2 5 5
# [4,] 1 1 1 1 1
If you wanted to include diagonals in the lattice, you might start with a lattice with neighborhood size 2 and then limit to elements that are no more than one row or one column apart. Consider the following matrix:
[A B C B]
[B A A A]
Here's the code that will capture 4 groups, not 6, due to including diagonal links:
# Build initial matrix and lattice graph (neighborhood size 2)
mat <- matrix(c(1, 2, 2, 1, 3, 1, 2, 1), nrow=2)
labels <- as.vector(mat)
rows <- (seq(length(labels)) - 1) %% nrow(mat)
cols <- ceiling(seq(length(labels)) / nrow(mat))
g <- graph.lattice(dim(mat), nei=2)
# Remove edges between elements of different types or that aren't diagonal
edgelist <- get.edgelist(g)
retain <- labels[edgelist[,1]] == labels[edgelist[,2]] &
abs(rows[edgelist[,1]] - rows[edgelist[,2]]) <= 1 &
abs(cols[edgelist[,1]] - cols[edgelist[,2]]) <= 1
g <- delete.edges(g, E(g)[!retain])
# Cluster to obtain final groups
matrix(clusters(g)$membership, nrow=nrow(mat))
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 2 1 1 1
I'm not quite sure whether this answers the same problem, but I recently wrote some code which groups wall segments in a maze in the same manner, i.e. nearest-neighbor. Mine is iterative, and makes use of the dist() function. Here's some of the code I used.
I start with a N*4 matrix containing all the wall segments (generated using Prim's Tree Alg); the columns being (x0,y0,x1,y1) defining the endpoints of a given segment. All segments start and end on integer grid points and are of length 1. Each element of treelist
contains all clustered segments. For the question posted, this should be a little easier because each item has only one coordinate (row,column) rather than two.
treelist<-list()
treecnt<-1
#kill edge walls, i.e. wall segments on the border of the maze.
# edges<- which(dowalls[,1]==dowalls[,3] | dowalls[,2]==dowalls[,4])
vedges <- which( (dowalls[,1]==dowalls[,3]) & (dowalls[,1]==1 | dowalls[,1]==dimx+1) )
hedges <- which( (dowalls[,2]==dowalls[,4]) & (dowalls[,2]==1 | dowalls[,1]==dimy+1) )
dowalls<-dowalls[-c(vedges,hedges),,drop=FALSE]
# now sort into trees
while(nrow(dowalls)>0 ) {
tree <- matrix(dowalls[1,],nr=1) #force dimensions
dowalls<-dowalls[-1,,drop=FALSE]
treerow <- 1 #current row of tree we're looking at
while ( treerow <= nrow(tree) ) {
#only examine the first 'column' of the dist() matrix 'cause those are the
# distances from the tree[] endpoints
touch <- c( which(dist(rbind(tree[treerow,1:2],dowalls[,1:2]) )[1:nrow(dowalls)]==0), which(dist(rbind(tree[treerow,1:2],dowalls[,3:4]) )[1:nrow(dowalls)]==0), which(dist(rbind(tree[treerow,3:4],dowalls[,1:2]) )[1:nrow(dowalls)]==0), which(dist(rbind(tree[treerow,3:4],dowalls[,3:4]) )[1:nrow(dowalls)]==0) )
if(length(touch) ) {
tree <- rbind(tree,dowalls[c(touch),])
dowalls <- dowalls[-c(touch),,drop=FALSE]
}
# now be careful: want to track the row of tree[] we're working with AND
# track how many rows there currently are in tree[]
treerow <- treerow + 1
} #end of while treerow <= nrow
treelist[[treecnt]]<-tree
treecnt <- treecnt + 1
} #end ; all walls have been classified
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With