Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identifying "clusters" or "groups" in a matrix

I have a matrix that is populated with discrete elements, and I need to cluster them into intact groups. So, for example, take this matrix:

[A B B C A]
[A A B A A]
[A B B C C]
[A A A A A]

There would be two separate clusters for A, two separate clusters for C, and one cluster for B.

The output I'm looking for would ideally assign a unique ID to each clister, something like this:

[1 2 2 3 4]
[1 1 2 4 4]
[1 2 2 5 5]
[1 1 1 1 1]

Right now I have an R code that does this recursively by just iteratively checking nearest neighbor, but it quickly overflows when the matrix gets large (i.e., 100x100).

Is there a built in function in R that can do this? I looked into raster and image processing, but no luck. I'm convinced it must be out there.

Thanks!

like image 207
user3037237 Avatar asked Jun 27 '14 15:06

user3037237


2 Answers

You could approach this by building a lattice graph representing your matrix, where edges are only retained if the vertices have the same type:

# Build initial matrix and lattice graph
library(igraph)
mat <- matrix(c(1, 1, 1, 1, 2, 1, 2, 1, 2, 2, 2, 1, 3, 1, 3, 1, 1, 1, 3, 1), nrow=4)
labels <- as.vector(mat)
g <- graph.lattice(dim(mat))
lyt <- layout.auto(g)

# Remove edges between elements of different types
edgelist <- get.edgelist(g)
retain <- labels[edgelist[,1]] == labels[edgelist[,2]]
g <- delete.edges(g, E(g)[!retain])

# Take a look at what we have
plot(g, layout=lyt)

enter image description here

Vertices are numbered going down columns. It's easy to see that all we need to do is grab the components of this graph:

matrix(clusters(g)$membership, nrow=nrow(mat))
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    2    2    3    4
# [2,]    1    1    2    4    4
# [3,]    1    2    2    5    5
# [4,]    1    1    1    1    1

If you wanted to include diagonals in the lattice, you might start with a lattice with neighborhood size 2 and then limit to elements that are no more than one row or one column apart. Consider the following matrix:

[A B C B]
[B A A A]

Here's the code that will capture 4 groups, not 6, due to including diagonal links:

# Build initial matrix and lattice graph (neighborhood size 2)
mat <- matrix(c(1, 2, 2, 1, 3, 1, 2, 1), nrow=2)
labels <- as.vector(mat)
rows <- (seq(length(labels)) - 1) %% nrow(mat)
cols <- ceiling(seq(length(labels)) / nrow(mat))
g <- graph.lattice(dim(mat), nei=2)

# Remove edges between elements of different types or that aren't diagonal
edgelist <- get.edgelist(g)
retain <- labels[edgelist[,1]] == labels[edgelist[,2]] &
  abs(rows[edgelist[,1]] - rows[edgelist[,2]]) <= 1 &
  abs(cols[edgelist[,1]] - cols[edgelist[,2]]) <= 1
g <- delete.edges(g, E(g)[!retain])

# Cluster to obtain final groups
matrix(clusters(g)$membership, nrow=nrow(mat))
#      [,1] [,2] [,3] [,4]
# [1,]    1    2    3    4
# [2,]    2    1    1    1
like image 68
josliber Avatar answered Sep 25 '22 02:09

josliber


I'm not quite sure whether this answers the same problem, but I recently wrote some code which groups wall segments in a maze in the same manner, i.e. nearest-neighbor. Mine is iterative, and makes use of the dist() function. Here's some of the code I used.

I start with a N*4 matrix containing all the wall segments (generated using Prim's Tree Alg); the columns being (x0,y0,x1,y1) defining the endpoints of a given segment. All segments start and end on integer grid points and are of length 1. Each element of treelist contains all clustered segments. For the question posted, this should be a little easier because each item has only one coordinate (row,column) rather than two.

treelist<-list()
 treecnt<-1
 #kill edge walls, i.e. wall segments on the border of the maze.
 #  edges<- which(dowalls[,1]==dowalls[,3] | dowalls[,2]==dowalls[,4])
 vedges <- which( (dowalls[,1]==dowalls[,3]) & (dowalls[,1]==1 | dowalls[,1]==dimx+1) )
 hedges <- which( (dowalls[,2]==dowalls[,4]) & (dowalls[,2]==1 | dowalls[,1]==dimy+1) )
 dowalls<-dowalls[-c(vedges,hedges),,drop=FALSE]
 # now sort into trees 
 while(nrow(dowalls)>0 ) {
     tree <- matrix(dowalls[1,],nr=1) #force dimensions 
     dowalls<-dowalls[-1,,drop=FALSE]
     treerow <- 1 #current row of tree we're looking at
     while ( treerow <= nrow(tree) ) {
         #only examine the first 'column' of the dist() matrix 'cause those are the
         # distances from the tree[] endpoints
         touch <- c( which(dist(rbind(tree[treerow,1:2],dowalls[,1:2]) )[1:nrow(dowalls)]==0),  which(dist(rbind(tree[treerow,1:2],dowalls[,3:4]) )[1:nrow(dowalls)]==0), which(dist(rbind(tree[treerow,3:4],dowalls[,1:2]) )[1:nrow(dowalls)]==0), which(dist(rbind(tree[treerow,3:4],dowalls[,3:4]) )[1:nrow(dowalls)]==0) )
         if(length(touch) ) {
            tree <- rbind(tree,dowalls[c(touch),])
            dowalls <- dowalls[-c(touch),,drop=FALSE] 
            }
    # now be careful: want to track the row of tree[] we're working with AND
    # track how many rows there currently are in tree[]
        treerow <- treerow + 1 
    } #end of while treerow <= nrow 
    treelist[[treecnt]]<-tree
    treecnt <- treecnt + 1 
} #end ; all walls have been classified
like image 37
Carl Witthoft Avatar answered Sep 24 '22 02:09

Carl Witthoft