Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cluster groups of 1s in a binary matrix

Tags:

r

binary

matrix

I'm looking to create clusters around all 1s and 0s. Similar to Mindsweeper, I want to basically "draw a circle" around all 1s, and create a border where 0s exist.

I have tried using hclust() and creating a distance matrix, but the actual table I am working with is very large, and I have run into problems with run time.

test_matrix  <- matrix(c( 1,1,0,0,0,0,1,     
                          1,1,1,0,0,1,0,
                          0,1,0,0,0,1,0,
                          0,0,0,1,1,1,0,
                          0,0,0,1,1,1,1),nrow=5)

Result looks like this:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    0    0    1    0    1    0
[2,]    1    1    0    0    0    1    1
[3,]    0    1    1    0    0    0    1
[4,]    0    1    0    0    0    0    1
[5,]    0    1    0    1    1    0    1

My rules are as follows: If any 1 is connected to any 1 via UP, DOWN, LEFT, RIGHT, DIAGONAL(any direction), continue growing the "cluster". Based on these rules (8 points of connection for every point), I can spot four unique clusters with isolated 1s.

How would you code to find these groups?

like image 560
Kyle Avatar asked Jun 26 '19 03:06

Kyle


1 Answers

I think clustering is the right approach here, but you choose a poor ( computationally expensive) method for the task. I would go for DBSCAN like this:

library(dbscan)

## slightly altered test matrix to include a "cluster" with a single 1
test_matrix  <- matrix(c( 1,1,0,0,0,0,1,     
                          1,1,1,0,0,1,0,
                          0,1,0,0,0,1,0,
                          0,0,0,1,1,1,0,
                          1,0,0,1,1,1,1),
                          nrow=5, byrow = TRUE)

## find rows and columns of 1s
ones_pos <- which(test_matrix > 0,arr.ind=TRUE)


## perform DBSCAN clustering
## setting eps = sqrt(2) + .1 corresponds to your neighbourhood definition
## setting minPts = 2 will mark clusters of one point as noise
clust <- dbscan(ones_pos, eps = sqrt(2), minPts = 2)

## find the indices of noise elements
singular_ones <- ones_pos[clust$cluster == 0, ]

singular_ones
#> row col 
#>  5   1 

To find all clusters (including those that just consist of one 1) just set minPts to 1. In this case there can be no noise. The cluster membership is stored in clust$cluster.

I am quite certain this approach will also be quite fast with large matrices.

like image 160
AEF Avatar answered Oct 14 '22 03:10

AEF