Finding Largest Contiguous Clusters in a Matrix

Question

Given the following matrix in the R programming language:

set.seed(123)
matrix_1 <- matrix(rbinom(100, 1, 0.5), nrow = 10, ncol = 10)

Here is a Depth-First Search (DFS) algorithm that identifies the clusters of 1s in this matrix. In this context, a "cluster" is a contiguous mapping of an integer on a matrix with a minimum cluster size of 3 and assumes 8-connectivity (i.e., includes diagonals). Note: I tried using an image-based approach with the EBImage package, but its execution was too slow for my purposes. I have thousands of 100X100 matrices to analyze!

find_clusters <- function(matrix) {
  rows <- nrow(matrix)
  cols <- ncol(matrix)
  
  # Create a matrix of the same size to mark visited cells
  visited <- matrix(0, nrow = rows, ncol = cols)
  
  # Define all 8 possible movements from a cell (8-connectivity)
  row_nbr <- c(-1, -1, -1,  0, 0,  1, 1, 1)
  col_nbr <- c(-1,  0,  1, -1, 1, -1, 0, 1)
  
  # A function to check if a cell can be included in the DFS
  is_valid <- function(row, col) {
    row >= 1 && row <= rows && col >= 1 && col <= cols &&
      visited[row, col] == 0 && matrix[row, col] == 1
  }
  
  # A function to do a DFS of a 2D boolean matrix. It only considers
  # the 8 cells directly connected to a cell
  DFS <- function(matrix, row, col, visited, cluster) {
    row_stack <- c(row)
    col_stack <- c(col)
    
    while (length(row_stack) > 0) {
      r <- row_stack[length(row_stack)]
      c <- col_stack[length(col_stack)]
      row_stack <- row_stack[-length(row_stack)]
      col_stack <- col_stack[-length(col_stack)]
      
      if (visited[r, c] == 0) {
        visited[r, c] <- 1
        cluster <- rbind(cluster, c(r, c))
        
        for (k in 1:8) {
          if (is_valid(r + row_nbr[k], c + col_nbr[k])) {
            row_stack <- c(row_stack, r + row_nbr[k])
            col_stack <- c(col_stack, c + col_nbr[k])
          }
        }
      }
    }
    return(cluster)
  }
  
  # The main function that returns all clusters
  get_clusters <- function(matrix, visited) {
    clusters <- list()
    for (i in 1:rows) {
      for (j in 1:cols) {
        if (visited[i, j] == 0 && matrix[i, j] == 1) {
          new_cluster <- DFS(matrix, i, j, visited, matrix(, nrow = 0, ncol = 2))
          if (nrow(new_cluster) >= 3) {
            clusters[[length(clusters) + 1]] <- new_cluster
          }
        }
      }
    }
    return(clusters)
  }
  
  return(get_clusters(matrix, visited))
}

Which works great and it's fast. However, this function returns ALL possible clusters of size > 3 (44 total), which includes smaller clusters nested within larger clusters.

The matrix as a binary image:

my_palette <- c("white", "black")

# correct for how image() reads a matrix
rotate <- function(x) t(apply(x, 2, rev))

image(rotate(matrix_1),
      axes = FALSE,
      col = my_palette_2)

I see only three clusters of size >= 3. How do I revise my function to "see" only the largest unbroken clusters on a matrix?

UPDATE

Thank you @I_O! I have 10000 100X100 matrices from a MATLAB simulation that models the behavior of sodium channels on a cellular membrane. The following function implements your suggestion and returns the cluster sizes of channel types 1 and 2:

library(dplyr)
library(terra)

# M: matrix of integers 

find_clusters_2chan <- function(M) {
  
  # Consider only 1s
  ones <- M == 1
  
  # convert matrix to raster
  raster_ones <- ones |> rast()
  
  # find clusters (consider zeros as NA, i. e. discontinuation)
  
  clusters_ones <- patches(raster_ones,
                           directions = 8,
                           zeroAsNA = TRUE)
  
  # generate frequency table
  ones_freq <- the_clusters_ones |> freq()
  
  # return counts >=3
  ones_freq$count %>%
    .[. >= 3] -> ONES
  
  #-------------------------------------------------------------------------------
  
  # Consider only 2s
  twos <- M == 2
  
  # convert matrix to raster
  raster_twos <- twos |> rast()
  
  # find clusters (consider zeros as NA, i. e. discontinuation)
  
  clusters_twos <- patches(raster_twos,
                           directions = 8,
                           zeroAsNA = TRUE)
  
  # generate frequency table
  twos_freq <- clusters_twos |> freq()
  
  # return counts >=3
  twos_freq$count %>%
    .[. >= 3] -> TWOS
  
  clusters_list <- list(channel_1 = ONES,
                        channel_2 = TWOS)
  
  return(clusters_list)
  
}

start <- Sys.time()

clusters_big_list <- lapply(list_of_matrices, find_clusters_2chan)

end <- Sys.time()

end - start

# run time = 3.902859 minutes

I_O · Accepted Answer

If you're fine with using a dedicated package for raster analysis like {terra}, the following should be convenient and fast (edit: wrapper function to generalize across channel values and minimum cluster sizes at the bottom)

library(terra)

set.seed(123)
matrix_1 <- matrix(rbinom(100, 1, 0.5), nrow = 10, ncol = 10)

convert matrix to raster:

raster_1 <- matrix_1 |> rast()

find clusters (consider zeros as NA, i. e. discontinuation)

the_clusters <- patches(raster_1, directions = 8, zeroAsNA=TRUE)

inspect the clusters identified:

the_clusters |> plot()

clusters

list cluster sizes (value: cluster ID, count: cluster size)

the_clusters |> freq()

  layer value count
1     1     1    24
2     1     2     8
3     1     3     1
4     1     4     2
5     1     5    12

A function to extract clusters of a minimum size for any number of channels could look like this:

find_clusters_2_all_chans <- 
  function(M, channel_numbers, min_count) {
    channel_numbers |> 
      Map(f = \(i){
        M |> 
          rast() |>
          app(fun = \(cell) ifelse(cell == i, cell, NA)) |>
          patches(directions = 8, zeroAsNA = TRUE) |>
          freq()|>
          (\(m) m[m['count'] > min_count,])()
      }) |>
      setNames(paste0('channel_', channel_numbers))
  }

example: pick clusters larger than three pixels for channels 1 and 2

find_clusters_2_all_chans(M, 1:2, 3)

> $channel_1
  layer value count
1     1     1     4
2     1     2    10
3     1     3     6
4     1     5     6
6     1     7     4

$channel_2
  layer value count
5     1     5    12

Finding Largest Contiguous Clusters in a Matrix

Tags:

optimization

r

cluster-analysis

depth-first-search

Tavaro Evanis

1 Answers

I_O

Recent Activity

Donate For Us

Finding Largest Contiguous Clusters in a Matrix

Tags:

optimization

r

cluster-analysis

depth-first-search

Tavaro Evanis

1 Answers

I_O

Related questions

Recent Activity

Donate For Us