Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing rows/columns with only one element from a binary matrix

Tags:

r

matrix

I'm trying to remove "singletons" from a binary matrix. Here, singletons refers to elements that are the only "1" value in the row AND the column in which they appear. For example, given the following matrix:

> matrix(c(0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,1,1), nrow=6)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0    1    0    0    0    0    0
[2,]    1    0    1    0    0    0    0
[3,]    0    0    0    1    0    0    0
[4,]    1    1    0    0    0    0    0
[5,]    0    0    0    0    1    1    1
[6,]    0    0    0    0    1    0    1

...I would like to remove all of row 3 (and, if possible, all of column 4), because the 1 in [3,4] is the only 1 in that row/column combination. [1,2] is fine, since there are other 1's in column [,2]; similarly, [2,3] is fine, since there are other 1's in row [2,]. Any help would be appreciated - thanks!

like image 681
Matt LaFave Avatar asked Jun 18 '15 18:06

Matt LaFave


2 Answers

You first want to find which rows and columns are singletons and then check if there are pairs of singletons rows and columns that share an index. Here is a short bit of code to accomplish this task:

foo <- matrix(c(0,1,0,...))
singRows <- which(rowSums(foo) == 1)
singCols <- which(colSums(foo) == 1)
singCombinations <- expand.grid(singRows, singCols)
singPairs <- singCombinations[apply(singCombinations, 1,
    function(x) which(foo[x[1],] == 1) == x[2]),]
noSingFoo <- foo[-unique(singPairs[,1]), -unique(singPairs[,2])]

With many sinlgeton ros or columns you might need to make this a bit more efficient, but it does the job.

UPDATE: Here is the more efficient version I knew could be done. This way you loop only over the rows (or columns if desired) and not all combinations. Thus it is much more efficient for matrices with many singleton rows/columns.

## starting with foo and singRows as before
singPairRows <- singRows[sapply(singRows, function(singRow)
    sum(foo[,foo[singRow,] == 1]) == 1)]
singPairs <- sapply(singPairRows, function(singRow)
    c(singRow, which(foo[singRow,] == 1)))
noSingFoo <- foo[-singPairs[1,], -singPairs[2,]]

UPDATE 2: I have compared the two methods (mine=nonsparse and @Chris's=sparse) using the rbenchmark package. I have used a range of matrix sizes (from 10 to 1000 rows/columns; square matrices only) and levels of sparsity (from 0.1 to 5 non-zero entries per row/column). The relative level of performance is shown in the heat map below. Equal performance (log2 ratio of run times) is designated by white, faster with sparse method is red and faster with non-sparse method is blue. Note that I am not including the conversion to a sparse matrix in the performance calculation, so that will add some time to the sparse method. Just thought it was worth a little effort to see where this boundary was. Relative Performance

like image 149
cr1msonB1ade Avatar answered Oct 08 '22 01:10

cr1msonB1ade


cr1msonB1ade's way is a great answer. For more computationally intensive matrices (millions x millions), you can use this method:

Encode your matrix in sparse notation:

DT <- structure(list(i = c(1, 2, 2, 3, 4, 4, 5, 5, 5, 6, 6), j = c(2, 
                                                             1, 3, 4, 1, 2, 5, 6, 7, 5, 7), val = c(1, 1, 1, 1, 1, 1, 1, 1, 
                                                                                                    1, 1, 1)), .Names = c("i", "j", "val"), row.names = c(NA, -11L
                                                                                                    ), class = "data.frame")

Gives (0s are implicit)

> DT
   i j val
1  1 2   1
2  2 1   1
3  2 3   1
4  3 4   1
5  4 1   1
6  4 2   1
7  5 5   1
8  5 6   1
9  5 7   1
10 6 5   1
11 6 7   1

Then we can filter using:

DT <- data.table(DT)

DT[, rowcount := .N, by = i]
DT[, colcount := .N, by = j]

Giving:

>DT[!(rowcount*colcount == 1)]
    i j val rowcount colcount
 1: 1 2   1        1        2
 2: 2 1   1        2        2
 3: 2 3   1        2        1
 4: 4 1   1        2        2
 5: 4 2   1        2        2
 6: 5 5   1        3        2
 7: 5 6   1        3        1
 8: 5 7   1        3        2
 9: 6 5   1        2        2
10: 6 7   1        2        2

(Note the (3,4) row is now missing)

like image 30
Chris Avatar answered Oct 07 '22 23:10

Chris