Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assignment to subset of a matrix with repeated indices

Tags:

r

subset

Not sure this qualifies for an entry in the R-Inferno, but can someone comment on the logic behind the way the following replacement works?

foo<-matrix(1:6,2)
bar<-foo[2,c(1,3,1)]
bar
# [1] 2 6 2
foo[2,c(1,3,1)]<-foo[2,c(1,3,1)]+5
foo
#      [,1] [,2] [,3]
# [1,]    1    3    5
# [2,]    7    4   11

My question is: when generating bar, the repeated coordinate results in a repeated element in the output, but when modifying foo, the repeated coordinate does not result in a repeated addition operation. (By comparison, for(j in c(1,3,1) ) foo[2,j]<-foo[2,j]+5 does). Why & how exactly does [<- essentially ignore the repeated index?

like image 822
Carl Witthoft Avatar asked Mar 10 '14 16:03

Carl Witthoft


2 Answers

From help("[<-"):

Subassignment is done sequentially, so if an index is specified more than once the latest assigned value for an index will result.

foo<-matrix(1:6,2)

foo[1,rep(1,2)] <- c(1,42)

#     [,1] [,2] [,3]
#[1,]   42    3    5
#[2,]    2    4    6
like image 159
Roland Avatar answered Sep 28 '22 04:09

Roland


To try to answer the secondary question in the comments indirectly:

> vec <- 1:10
> microbenchmark(
+       rep(1, 1e4),
+       vec[rep(1, 1e4)] <- 1:1e4,
+       vec[1] <- 1e4
+     )
Unit: microseconds
                          expr     min       lq   median       uq      max neval
                 rep(1, 10000)  16.457  17.9190  18.2860  19.0170 2561.327   100
 vec[rep(1, 10000)] <- 1:10000 215.395 219.7835 227.8285 233.6795 3437.532   100
               vec[1] <- 10000   1.463   2.1950   3.2920   3.8405   22.308   100

Strongly suggests that the same values are assigned to the same memory location over and over until only the last one prevails. Why they are not added is just because the operation here is overwriting, not adding (though maybe that was not what you were asking with "does not result in a repeated addition operation").

Note that your loop and your direct assignment are not equivalent since in your loop you are reading, adding, assigning, re-reading, re-adding, re-assigning, etc., whereas in your direct assignment you are reading once, adding to the single vector once, and then only preserving the last value through over-writing.

The key difference between the "reading" is that the expected "output" is a vector length of the index vector, whereas the length of the "writing" (excluding the case where you are using out of bounds indices) vector is limited by the vector you're writing to.

like image 44
BrodieG Avatar answered Sep 28 '22 06:09

BrodieG