Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a matrix of 0s and 1s, such that each row has only one 1 and each column has at least two 1s

Tags:

r

matrix

I want to create a 100*4 matrix of 0s and 1s, such that each row has only one 1 and each column has at least two 1s, in R.

MyMat <- as.matrix(rsparsematrix(nrow=100, ncol=4, nnz  = 100))

I am thinking of rsparsematrix but yet I am not sure how to apply my required conditions.

edit. My other try would be dummy_cols, but then no matter what. I am stuck with applying the two conditions yet. I guess there must be a more straightforward way of creating such a matrix.

like image 233
Mathica Avatar asked Mar 01 '23 11:03

Mathica


2 Answers

1) A matrix consisting of 25 4x4 identity matrices stacked one on top of each other satisfies these requirements

m <- matrix(1, 25) %x% diag(4)

2) Exchanging the two arguments of %x% would also work and gives a different matrix which also satisfies this.

3) Any permutation of the rows and the columns of the two solution matrices in (1) and (2) would also satisfy the conditions.

m[sample(100), sample(4)]

4) If the objective is to generate a random table containing 0/1 values whose row sums are each 1 and whose column sums are each 25 then use r2dtable:

r <- r2dtable(1, rep(1, 100), rep(25, 4))[[1]]

5) or if it is desired to allow any column sums of at least 2 then:

rsums <- rep(1, 100)
csums <- rmultinom(1, 92, rep(0.25, 4)) + 2
r <- r2dtable(1, rsums, csums)[[1]]
like image 56
G. Grothendieck Avatar answered Mar 06 '23 19:03

G. Grothendieck


Stochastically, with two rules:

  1. All rows must have exactly one 1; and
  2. All columns must have at least two 1s.

I control the first implicitly by construction; I test against the second.

nr <- 100 ; nc <- 4
set.seed(42)
lim <- 10000
while (lim > 0) {
  lim <- lim - 1
  M <- t(replicate(nr, sample(c(1, rep(0, nc-1)))))
  if (all(colSums(M > 0) >= 2)) break
}
head(M)
#      [,1] [,2] [,3] [,4]
# [1,]    1    0    0    0
# [2,]    0    0    0    1
# [3,]    0    0    0    1
# [4,]    0    1    0    0
# [5,]    0    0    0    1
# [6,]    0    1    0    0

colSums(M)
# [1] 25 30 21 24

lim
# [1] 9999

My use of lim is hardly needed in this example, but is there as a mechanism to stop this from running infinitely: if you change the dimensions and/or the rules, it might become highly unlikely or infeasible to meet all rules, so this keeps the execution time limited. (10000 is completely arbitrary.)

My point in the comment is that it would be rather difficult to find a 100x4 matrix that matches rule 1 that does not match rule 2. In fact, since the odds of a 0 or a 1 in any one cell is 0.75 and 0.25, respectively, to find a column (among 100 rows) that contains fewer than two 1s would be around 1.1e-11.

like image 25
r2evans Avatar answered Mar 06 '23 20:03

r2evans