Thank you so much for your help in advance!
I am trying to modify an existing matrix such that, when a new line is added to the matrix, it removes values from the preexisting matrix.
For example, I have the matrix:
[,1] [,2] [,3] [,4]
1 1 0 0
0 1 0 0
1 0 1 0
0 0 1 1
I want to add another vector, I.vec, which has two values (I.vec=c(0,1,1,0)
).
This is easy enough to do. I just rbind it to the matrix.
Now, for every column where I.vec is equal to 1, I want to randomly select a value from the other rows and make it zero.
Ideally, this would end up with a matrix like:
[,1] [,2] [,3] [,4]
1 0 0 0
0 1 0 0
1 0 0 0
0 0 1 1
0 1 1 0
But each time I run the iteration, I want it to randomly sample again.
So this is what I have tried:
mat1<-matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1),byrow=T, nrow=4)
I.vec<-c(0,1,1,0)
mat.I<-rbind(mat1,I.vec)
mat.I.r<-mat.I
d1<-mat.I[,which(mat.I[5,]==1)]
mat.I.r[sample(which(d1[1:4]==1),1),which(mat.I[5,]==1)]<-0
But this only deletes one of the two values I would like to delete. I have also tried variations on subsetting the matrix, but I have not been successful.
Thank you again!
There is a little bit of ambiguity in the description from the OP, so two solutions are suggested:
1
s in relevant columns can be set to 0
I'll just alter the original function (see below). The change is to the line defining rows
. I now have (there was a bug in the original - the version below is revised to handle deal with the bug):
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] == 1L)
out <- if(length(ones) == 1L) {
ones
} else {
sample(ones, 1)
}
out
}, mat = mat, cols = cols)
Basically, what this does is, for each column we need to swap a 1
to a 0
, we work out which rows of the column contain 1
s and sample one of these.
Edit: We have to handle the case where there is only a single 1
in a column. If we just sample from a length 1 vector, R's sample()
will treat it as if we wanted to sample from the set seq_len(n)
not from the length 1 set n
. We handle this now with an if, else
statement.
We have to do this individually for each column so we get the correct rows. I suppose we could do some nice manipulation to avoid repeated calls to which()
and sample()
, but how escapes me at the moment, because we do have to handle the case where there is only one 1
in the column. Here's the finished function (updated to handle the length 1 sample bug in the original):
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec == 1L)
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] == 1L)
out <- if(length(ones) == 1L) {
ones
} else {
sample(ones, 1)
}
out
}, mat = mat, cols = cols)
ind <- (nr*(cols-1)) + rows
mat[ind] <- 0
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
and here it is in action:
> set.seed(2)
> foo(mat1, ivec)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 1 0 1 0
[4,] 0 0 0 1
[5,] 0 1 1 0
and it works when there is only one 1
in a column we want to do a swap in:
> foo(mat1, c(0,0,1,1))
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 0 1 0 0
[3,] 1 0 1 0
[4,] 0 0 0 1
[5,] 0 0 1 1
Here is a vectorised answer, where we treat the matrix as a vector when doing the replacement. Using the example data:
mat1 <- matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1), byrow = TRUE, nrow = 4)
ivec <- c(0,1,1,0)
## Set a seed to make reproducible
set.seed(2)
## number of rows and columns of our matrix
nr <- nrow(mat1)
nc <- ncol(mat1)
## which of ivec are 1L
cols <- which(ivec == 1L)
## sample length(cols) row indices, with replacement
## so same row can be drawn more than once
rows <- sample(seq_len(nr), length(cols), replace = TRUE)
## Compute the index of each rows cols combination
## if we treated mat1 as a vector
ind <- (nr*(cols-1)) + rows
## ind should be of length length(cols)
## copy for illustration
mat2 <- mat1
## replace the indices we want with 0, note sub-setting as a vector
mat2[ind] <- 0
## bind on ivec
mat2 <- rbind(mat2, ivec)
This gives us:
> mat2
[,1] [,2] [,3] [,4]
1 0 0 0
0 1 0 0
1 0 0 0
0 0 1 1
ivec 0 1 1 0
If I were doing this more than once or twice, I'd wrap this in a function:
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec == 1L)
rows <- sample(seq_len(nr), length(cols), replace = TRUE)
ind <- (nr*(cols-1)) + rows
mat[ind] <- 0
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
Which gives:
> foo(mat1, ivec)
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 0 1 0 0
[3,] 1 0 1 0
[4,] 0 0 0 1
[5,] 0 1 1 0
If you wanted to do this for multiple ivec
s, growing mat1
each time, then you probably don't want to do that in a loop as growing objects is slow (it involves copies etc). But you could just modify the definition of ind
to include the extra n
rows you bind on for the n
ivec
s.
You could try something like this. Having 'nrow' in there will allow you to run it multiple times with other 'I.vec's. I tried to do this in a single line with 'apply' but couldn't get a matrix to come out again.
mat1<-matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1),byrow=T, nrow=4)
I.vec<-c(0,1,1,0)
mat.I.r<-rbind(mat1,I.vec)
for(i in 1:ncol(mat.I.r))
{
ifelse(mat.I.r[nrow(mat.I.r),i]==1, mat.I.r[sample(which(mat.I.r[1:(nrow(mat.I.r)-1),i]==1),1), i] <- 0, "")
}
mat.I.r
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With