Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this insertion fast in R? [duplicate]

Tags:

r

rstudio

By accident i've encountered strange behaviour of "[<-" operator. It behaves differently depending on order of calls and whether i'm using RStudio or just ordinary RGui. I will make myself clear with an example.

x <- 1:10
"[<-"(x, 1, 111)
x[5] <- 123

As far as i know, first assigment shouldn't change x (or maybe i'm wrong?), while the second should do. And in fact the result of above operations is

x
[1]  1  2  3  4  123  6  7  8  9 10

However, when we perform these operations in different order, results are different and x has changed! Meaningly:

x <- 1:10
x[5] <- 123
"[<-"(x, 1, 111)
x
[1] 111   2   3   4   123   6   7   8   9  10

But it only happens when i'm using plain R! In RStudio the behaviour is the same in both options. I've checked it on two machines (one with Fedora one with Win7) and the situation looks exactly the same. I know the 'functional' version ("[<-"(x..)) is probably never used but i'm very curious why it is happening. Could anyone explain that?

==========================

EDIT: Ok, so from comments i get that the reason was that x <- 1:10 has type 'integer' and after replacing x[5] <- 123 it's 'double'. But still remains question why behaviour is different in RStudio? I restart R session and it doesn't change anything.

like image 386
BartekCh Avatar asked Nov 20 '22 22:11

BartekCh


1 Answers

Rstudio's behavior

Rstudio's object browser modifies objects it examines in a way that forces copying upon modification. Specifically, the object browser employs at least one R function whose call internally forces evaluation of the object, in the process resetting the value of the object's named field from 1 to 2. From the R-Internals manual:

When an object is about to be altered, the named field is consulted. A value of 2 means that the object must be duplicated before being changed. [...] A value of 1 is used for situations [...] where in principle two copies of a exist for the duration of the computation [...] but for no longer, and so some primitive functions can be optimized to avoid a copy in this case.

To see that the object browser modifies the named field ([NAM()] in the next code block), compare the results of running the following lines. In the first, both 'lines' are run together, so that Rstudio has no time to 'touch' X before its structure is queried. In the second, each line is pasted in separately, so X is modified before it is examined.

## Pasted in together
x <- 1:10; .Internal(inspect(x))
# @46b47b8 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 1,2,3,4,5,...

## Pasted in with some delay between lines
x <- 1:10
.Internal(inspect(x))
# @42111b8 13 INTSXP g0c4 [NAM(2)] (len=10, tl=0) 1,2,3,4,5,... 

Once the named field is set to 2, [<-(X, ...) will not modify the original object. Pasting the following into Rstudio all at once modifies X, while pasting it in line-by-line does not:

x <- 1:10
"[<-"(x, 1, 111)

One more consequence of all this is that Rstudio's object browser actually makes some operations slower than they would otherwise be. Again, compare the same two commands first pasted in together, and then one at a time:

## Pasted in together
x <- 1:5e7
system.time(x[1] <- 9L)
#    user  system elapsed 
#       0       0       0 

## Pasted in one at a time
x <- 1:5e7
system.time(x[1] <- 9L)
#    user  system elapsed 
#    0.11    0.04    0.16 

Variable behavior of [<- in R

The behavior of [<- w.r.t. modifying a vector X depends on the storage types of X and of the element being assigned into it. This explains R's behavior but not Rstudio's.

In R, when [<- either appends to a vector X, or performs a subassignment that requires that X's type be modified, X is copied and the value that is returned does not overwrite the pre-existing variable X. (To do that you need to do something like X <- "[<-(X, 2, 100).

So, neither of the following modify X

X <- 1:2         ## Note: typeof(X) --> "integer"

## Subassignment that requires that X be coerced to "numeric" type
"[<-"(X, 2, 100) ## Note: typeof(100) --> "numeric"
X 
# [1]   1   2

## Appending to X
"[<-"(X, 3, 100L)
X
# [1]   1   2

Whenever possible, though, R does allow the [<- function to modify X directly by reference (i.e. without making a copy). "Possible" here includes cases in which a sub-assignment doesn't require that X's type be modified.

So all of the following modify X

X <- c(0i, 0i, 0i, 0i)
"[<-"(X, 1, TRUE)
"[<-"(X, 2, 20L)
"[<-"(X, 3, 3.14)
"[<-"(X, 4, 5+5i)
X
# [1]  1.00+0i 20.00+0i  3.14+0i  5.00+5i
like image 58
Josh O'Brien Avatar answered Dec 09 '22 12:12

Josh O'Brien