Given the following matrix in the R programming language:
set.seed(123)
matrix_1 <- matrix(rbinom(100, 1, 0.5), nrow = 10, ncol = 10)
Here is a Depth-First Search (DFS) algorithm that identifies the clusters of 1s in this matrix. In this context, a "cluster" is a contiguous mapping of an integer on a matrix with a minimum cluster size of 3 and assumes 8-connectivity (i.e., includes diagonals). Note: I tried using an image-based approach with the EBImage package, but its execution was too slow for my purposes. I have thousands of 100X100 matrices to analyze!
find_clusters <- function(matrix) {
rows <- nrow(matrix)
cols <- ncol(matrix)
# Create a matrix of the same size to mark visited cells
visited <- matrix(0, nrow = rows, ncol = cols)
# Define all 8 possible movements from a cell (8-connectivity)
row_nbr <- c(-1, -1, -1, 0, 0, 1, 1, 1)
col_nbr <- c(-1, 0, 1, -1, 1, -1, 0, 1)
# A function to check if a cell can be included in the DFS
is_valid <- function(row, col) {
row >= 1 && row <= rows && col >= 1 && col <= cols &&
visited[row, col] == 0 && matrix[row, col] == 1
}
# A function to do a DFS of a 2D boolean matrix. It only considers
# the 8 cells directly connected to a cell
DFS <- function(matrix, row, col, visited, cluster) {
row_stack <- c(row)
col_stack <- c(col)
while (length(row_stack) > 0) {
r <- row_stack[length(row_stack)]
c <- col_stack[length(col_stack)]
row_stack <- row_stack[-length(row_stack)]
col_stack <- col_stack[-length(col_stack)]
if (visited[r, c] == 0) {
visited[r, c] <- 1
cluster <- rbind(cluster, c(r, c))
for (k in 1:8) {
if (is_valid(r + row_nbr[k], c + col_nbr[k])) {
row_stack <- c(row_stack, r + row_nbr[k])
col_stack <- c(col_stack, c + col_nbr[k])
}
}
}
}
return(cluster)
}
# The main function that returns all clusters
get_clusters <- function(matrix, visited) {
clusters <- list()
for (i in 1:rows) {
for (j in 1:cols) {
if (visited[i, j] == 0 && matrix[i, j] == 1) {
new_cluster <- DFS(matrix, i, j, visited, matrix(, nrow = 0, ncol = 2))
if (nrow(new_cluster) >= 3) {
clusters[[length(clusters) + 1]] <- new_cluster
}
}
}
}
return(clusters)
}
return(get_clusters(matrix, visited))
}
Which works great and it's fast. However, this function returns ALL possible clusters of size > 3 (44 total), which includes smaller clusters nested within larger clusters.
The matrix as a binary image:
my_palette <- c("white", "black")
# correct for how image() reads a matrix
rotate <- function(x) t(apply(x, 2, rev))
image(rotate(matrix_1),
axes = FALSE,
col = my_palette_2)
I see only three clusters of size >= 3. How do I revise my function to "see" only the largest unbroken clusters on a matrix?
UPDATE
Thank you @I_O! I have 10000 100X100 matrices from a MATLAB simulation that models the behavior of sodium channels on a cellular membrane. The following function implements your suggestion and returns the cluster sizes of channel types 1 and 2:
library(dplyr)
library(terra)
# M: matrix of integers
find_clusters_2chan <- function(M) {
# Consider only 1s
ones <- M == 1
# convert matrix to raster
raster_ones <- ones |> rast()
# find clusters (consider zeros as NA, i. e. discontinuation)
clusters_ones <- patches(raster_ones,
directions = 8,
zeroAsNA = TRUE)
# generate frequency table
ones_freq <- the_clusters_ones |> freq()
# return counts >=3
ones_freq$count %>%
.[. >= 3] -> ONES
#-------------------------------------------------------------------------------
# Consider only 2s
twos <- M == 2
# convert matrix to raster
raster_twos <- twos |> rast()
# find clusters (consider zeros as NA, i. e. discontinuation)
clusters_twos <- patches(raster_twos,
directions = 8,
zeroAsNA = TRUE)
# generate frequency table
twos_freq <- clusters_twos |> freq()
# return counts >=3
twos_freq$count %>%
.[. >= 3] -> TWOS
clusters_list <- list(channel_1 = ONES,
channel_2 = TWOS)
return(clusters_list)
}
start <- Sys.time()
clusters_big_list <- lapply(list_of_matrices, find_clusters_2chan)
end <- Sys.time()
end - start
# run time = 3.902859 minutes
If you're fine with using a dedicated package for raster analysis like {terra}, the following should be convenient and fast (edit: wrapper function to generalize across channel values and minimum cluster sizes at the bottom)
library(terra)
set.seed(123)
matrix_1 <- matrix(rbinom(100, 1, 0.5), nrow = 10, ncol = 10)
raster_1 <- matrix_1 |> rast()
the_clusters <- patches(raster_1, directions = 8, zeroAsNA=TRUE)
the_clusters |> plot()

the_clusters |> freq()
layer value count
1 1 1 24
2 1 2 8
3 1 3 1
4 1 4 2
5 1 5 12
A function to extract clusters of a minimum size for any number of channels could look like this:
find_clusters_2_all_chans <-
function(M, channel_numbers, min_count) {
channel_numbers |>
Map(f = \(i){
M |>
rast() |>
app(fun = \(cell) ifelse(cell == i, cell, NA)) |>
patches(directions = 8, zeroAsNA = TRUE) |>
freq()|>
(\(m) m[m['count'] > min_count,])()
}) |>
setNames(paste0('channel_', channel_numbers))
}
example: pick clusters larger than three pixels for channels 1 and 2
find_clusters_2_all_chans(M, 1:2, 3)
> $channel_1
layer value count
1 1 1 4
2 1 2 10
3 1 3 6
4 1 5 6
6 1 7 4
$channel_2
layer value count
5 1 5 12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With