Max-clique problem in a non-square matrix

Question

I have many non-square matrices like this example:

1    1    0
1    1    0
1    1    0
1    0    1

I'd like a general solution that finds the largest densely connected region in these matrices. So for my example the solution would return rows=c(1, 2, 3), columns=c(1,2). That said, I'm okay with non-optimal solutions, i.e. local minima are fine.

I think this is similar to the max-clique problem. However, my matrices aren't square and they don't represent graphs, so I'm having trouble using network tools for cliques like igraph::cliques(). How do I find densely connected regions of non-square matrices?

For clarification by "dense region" I mean any rectangular block in the matrix that contains all 1s, and this can be arrived at by reordering rows and columns. So ordering of rows and columns in the original matrix doesn't matter and I want to consider all permutations of orderings. I'm really looking for regions that are analogous/equivalent to cliques in an adjacency matrix, but, again, these matrices aren't square.

Szabolcs · Accepted Answer

You are looking for maximal bicliques. igraph does not support these directly yet, but feel free to open a feature request on GitHub here. If you do, it'd be nice if you could look up and reference some algorithms to compute it.

For now, we can reduce this to a standard clique problem by interpreting this non-square matrix as a non-diagonal piece of a symmetric square matrix. This is your matrix:

enter image description here

We transform it to this:

enter image description here

White represents zero, any other colour represents one. I used two non-white colours to make it clear where the original matrix appears.

The original rows 1-4 are still 1-4, the original columns 1-3 are now 5-7.

We can now look for maximal cliques which contain both "row vertices" (1-4) and "column vertices" (5-7).

To implement this with igraph, you can:

Complement the matrix m, take m <- 1 - m.
Use graph_from_incidence_matrix(m) to get a graph corresponding to the complement.
Now you could look for independent vertex sets, see maximal_ivs(). But as I recall, this still uses an inefficient implementation in igraph. So I recommend taking the complementer() graph and looking for max_cliques() instead.

This what we get:

1, 2, 3, 4, 5, which is rows 1-4 and column 1
1, 2, 3, 5, 6, which is rows 1-3 and columns 1-2 (your example solution)
4, 5, 7, which is row 4 and columns 1, 3
5, 6, 7, which we throw away because it contains only columns, and no rows.

I'll leave the detailed solution to you as I'm not much of an R programmer. I did this using IGraph/M in Mathematica.

First steps of an R solution, still needs filtering of results:

library(igraph)

m <- matrix(c(1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1), nrow = 4, ncol = 3, byrow = TRUE)
g <- graph_from_incidence_matrix(1 - m)

# one way:
maximal_ivs(g)

# another way, likely faster:
max_cliques(complementer(g))

# we are basically finding cliques in:
1-as_adjacency_matrix(g)

ThomasIsCoding · Answer

I think the igraph approach by @szabolcs is the most efficient and concise to address your problem.

Below is another old-school solution with base R only, which is neither efficient nor concise as using igraph, but you probably can find some hints there if you would like to implement without using external helping tools.

f <- function(m) {
    idx <- which(m == 1, TRUE)
    res <- list(list(idx = NULL, sz = 0))
    for (k in 1:nrow(idx)) {
        i1 <- idx[k, 1]
        j1 <- idx[k, 2]
        for (i2 in i1:nrow(m)) {
            for (j2 in j1:ncol(m)) {
                if (all(m[i1:i2, j1:j2, drop = FALSE] == 1)) {
                    i <- i1:i2
                    j <- j1:j2
                    l <- length(i) * length(j)
                    maxsz <- max(unlist(lapply(res, `[[`, "sz")))
                    toadd <- list(list(
                        idx = list(i = i1:i2, j = j1:j2),
                        sz = l
                    ))
                    if (l > maxsz) {
                        res <- toadd
                    }
                    if (l == maxsz) {
                        res <- c(res, toadd)
                    }
                }
            }
        }
    }
    res
}

where res contains both the index ($idx$i for row and $idx$j for column) and size ($sz) info of the max clique(s), if there are multiples (see the example 2 in output section).

Output Examples

Example 1

Given matrix m

> m
     [,1] [,2] [,3]
[1,]    1    1    0
[2,]    1    1    0
[3,]    1    1    0
[4,]    1    0    1

we obtain

> f(m)
[[1]]
[[1]]$idx
[[1]]$idx$i
[1] 1 2 3

[[1]]$idx$j
[1] 1 2


[[1]]$sz
[1] 6

Example 2

Given matrix m

> m
     [,1] [,2] [,3] [,4]
[1,]    1    1    0    0
[2,]    1    1    0    0
[3,]    1    1    1    1
[4,]    0    1    1    1

we obtain

> f(m)
[[1]]
[[1]]$idx
[[1]]$idx$i
[1] 1 2 3

[[1]]$idx$j
[1] 1 2


[[1]]$sz
[1] 6


[[2]]
[[2]]$idx
[[2]]$idx$i
[1] 3 4

[[2]]$idx$j
[1] 2 3 4


[[2]]$sz
[1] 6

Max-clique problem in a non-square matrix

Tags:

r

matrix

graph-theory

igraph

R Greg Stacey

2 Answers

Szabolcs

Output Examples

ThomasIsCoding

Recent Activity

Donate For Us

Max-clique problem in a non-square matrix

Tags:

r

matrix

graph-theory

igraph

R Greg Stacey

2 Answers

Szabolcs

Output Examples

ThomasIsCoding

Related questions

Recent Activity

Donate For Us