Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomly selecting values from an existing matrix after adding a vector (in R)

Tags:

r

matrix

sampling

Thank you so much for your help in advance!

I am trying to modify an existing matrix such that, when a new line is added to the matrix, it removes values from the preexisting matrix.

For example, I have the matrix:

[,1] [,2] [,3] [,4]
 1     1    0    0
 0     1    0    0
 1     0    1    0
 0     0    1    1

I want to add another vector, I.vec, which has two values (I.vec=c(0,1,1,0)). This is easy enough to do. I just rbind it to the matrix. Now, for every column where I.vec is equal to 1, I want to randomly select a value from the other rows and make it zero. Ideally, this would end up with a matrix like:

[,1] [,2] [,3] [,4]
 1     0    0    0
 0     1    0    0
 1     0    0    0
 0     0    1    1
 0     1    1    0

But each time I run the iteration, I want it to randomly sample again.

So this is what I have tried:

mat1<-matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1),byrow=T, nrow=4)
I.vec<-c(0,1,1,0)
mat.I<-rbind(mat1,I.vec)
mat.I.r<-mat.I
d1<-mat.I[,which(mat.I[5,]==1)]
mat.I.r[sample(which(d1[1:4]==1),1),which(mat.I[5,]==1)]<-0

But this only deletes one of the two values I would like to delete. I have also tried variations on subsetting the matrix, but I have not been successful.

Thank you again!

like image 379
Laura Avatar asked Jul 28 '11 06:07

Laura


2 Answers

There is a little bit of ambiguity in the description from the OP, so two solutions are suggested:

Assuming that only existing 1s in relevant columns can be set to 0

I'll just alter the original function (see below). The change is to the line defining rows. I now have (there was a bug in the original - the version below is revised to handle deal with the bug):

rows <- sapply(seq_along(cols), 
                   function(x, mat, cols) {
                       ones <- which(mat[,cols[x]] == 1L)
                       out <- if(length(ones) == 1L) {
                                  ones
                              } else {
                                  sample(ones, 1)
                       }
                       out
                   }, mat = mat, cols = cols)

Basically, what this does is, for each column we need to swap a 1 to a 0, we work out which rows of the column contain 1s and sample one of these.

Edit: We have to handle the case where there is only a single 1 in a column. If we just sample from a length 1 vector, R's sample() will treat it as if we wanted to sample from the set seq_len(n) not from the length 1 set n. We handle this now with an if, else statement.

We have to do this individually for each column so we get the correct rows. I suppose we could do some nice manipulation to avoid repeated calls to which() and sample(), but how escapes me at the moment, because we do have to handle the case where there is only one 1 in the column. Here's the finished function (updated to handle the length 1 sample bug in the original):

foo <- function(mat, vec) {
    nr <- nrow(mat)
    nc <- ncol(mat)

    cols <- which(vec == 1L)
    rows <- sapply(seq_along(cols), 
                   function(x, mat, cols) {
                       ones <- which(mat[,cols[x]] == 1L)
                       out <- if(length(ones) == 1L) {
                                  ones
                              } else {
                                  sample(ones, 1)
                              }
                       out
                   }, mat = mat, cols = cols)

    ind <- (nr*(cols-1)) + rows
    mat[ind] <- 0

    mat <- rbind(mat, vec)
    rownames(mat) <- NULL

    mat
}

and here it is in action:

> set.seed(2)
> foo(mat1, ivec)
     [,1] [,2] [,3] [,4]
[1,]    1    0    0    0
[2,]    0    1    0    0
[3,]    1    0    1    0
[4,]    0    0    0    1
[5,]    0    1    1    0

and it works when there is only one 1 in a column we want to do a swap in:

> foo(mat1, c(0,0,1,1))
     [,1] [,2] [,3] [,4]
[1,]    1    1    0    0
[2,]    0    1    0    0
[3,]    1    0    1    0
[4,]    0    0    0    1
[5,]    0    0    1    1

Original Answer: Assuming any value in a relevant column can be set to zero

Here is a vectorised answer, where we treat the matrix as a vector when doing the replacement. Using the example data:

mat1 <- matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1), byrow = TRUE, nrow = 4)
ivec <- c(0,1,1,0)

## Set a seed to make reproducible
set.seed(2)

## number of rows and columns of our matrix
nr <- nrow(mat1)
nc <- ncol(mat1)

## which of ivec are 1L
cols <- which(ivec == 1L)

## sample length(cols) row indices, with replacement
## so same row can be drawn more than once
rows <- sample(seq_len(nr), length(cols), replace = TRUE)

## Compute the index of each rows cols combination
## if we treated mat1 as a vector
ind <- (nr*(cols-1)) + rows
## ind should be of length length(cols)

## copy for illustration
mat2 <- mat1

## replace the indices we want with 0, note sub-setting as a vector
mat2[ind] <- 0

## bind on ivec
mat2 <- rbind(mat2, ivec)

This gives us:

> mat2
     [,1] [,2] [,3] [,4]
        1    0    0    0
        0    1    0    0
        1    0    0    0
        0    0    1    1
ivec    0    1    1    0

If I were doing this more than once or twice, I'd wrap this in a function:

foo <- function(mat, vec) {
    nr <- nrow(mat)
    nc <- ncol(mat)

    cols <- which(vec == 1L)
    rows <- sample(seq_len(nr), length(cols), replace = TRUE)

    ind <- (nr*(cols-1)) + rows
    mat[ind] <- 0

    mat <- rbind(mat, vec)
    rownames(mat) <- NULL

    mat
}

Which gives:

> foo(mat1, ivec)
     [,1] [,2] [,3] [,4]
[1,]    1    1    0    0
[2,]    0    1    0    0
[3,]    1    0    1    0
[4,]    0    0    0    1
[5,]    0    1    1    0

If you wanted to do this for multiple ivecs, growing mat1 each time, then you probably don't want to do that in a loop as growing objects is slow (it involves copies etc). But you could just modify the definition of ind to include the extra n rows you bind on for the n ivecs.

like image 52
Gavin Simpson Avatar answered Oct 05 '22 01:10

Gavin Simpson


You could try something like this. Having 'nrow' in there will allow you to run it multiple times with other 'I.vec's. I tried to do this in a single line with 'apply' but couldn't get a matrix to come out again.

mat1<-matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1),byrow=T, nrow=4)
I.vec<-c(0,1,1,0)
mat.I.r<-rbind(mat1,I.vec)

for(i in 1:ncol(mat.I.r))
  {
  ifelse(mat.I.r[nrow(mat.I.r),i]==1, mat.I.r[sample(which(mat.I.r[1:(nrow(mat.I.r)-1),i]==1),1), i] <- 0, "")
  }
mat.I.r
like image 43
nzcoops Avatar answered Oct 05 '22 02:10

nzcoops