I'm looking to create clusters around all 1
s and 0
s. Similar to Mindsweeper, I want to basically "draw a circle" around all 1
s, and create a border where 0
s exist.
I have tried using hclust()
and creating a distance matrix, but the actual table I am working with is very large, and I have run into problems with run time.
test_matrix <- matrix(c( 1,1,0,0,0,0,1,
1,1,1,0,0,1,0,
0,1,0,0,0,1,0,
0,0,0,1,1,1,0,
0,0,0,1,1,1,1),nrow=5)
Result looks like this:
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 0 0 1 0 1 0
[2,] 1 1 0 0 0 1 1
[3,] 0 1 1 0 0 0 1
[4,] 0 1 0 0 0 0 1
[5,] 0 1 0 1 1 0 1
My rules are as follows: If any 1
is connected to any 1
via UP, DOWN, LEFT, RIGHT, DIAGONAL(any direction), continue growing the "cluster". Based on these rules (8 points of connection for every point), I can spot four unique clusters with isolated 1
s.
How would you code to find these groups?
I think clustering is the right approach here, but you choose a poor ( computationally expensive) method for the task. I would go for DBSCAN like this:
library(dbscan)
## slightly altered test matrix to include a "cluster" with a single 1
test_matrix <- matrix(c( 1,1,0,0,0,0,1,
1,1,1,0,0,1,0,
0,1,0,0,0,1,0,
0,0,0,1,1,1,0,
1,0,0,1,1,1,1),
nrow=5, byrow = TRUE)
## find rows and columns of 1s
ones_pos <- which(test_matrix > 0,arr.ind=TRUE)
## perform DBSCAN clustering
## setting eps = sqrt(2) + .1 corresponds to your neighbourhood definition
## setting minPts = 2 will mark clusters of one point as noise
clust <- dbscan(ones_pos, eps = sqrt(2), minPts = 2)
## find the indices of noise elements
singular_ones <- ones_pos[clust$cluster == 0, ]
singular_ones
#> row col
#> 5 1
To find all clusters (including those that just consist of one 1) just set minPts
to 1. In this case there can be no noise. The cluster membership is stored in clust$cluster
.
I am quite certain this approach will also be quite fast with large matrices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With