Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is a length one vector initially at NAM(2)?

Tags:

r

I stumbled across this behavior:

x <- 1:5
> tracemem(x)
[1] "<0x12145b7a8>"
> "names<-"(x, letters[1:5])
a b c d e 
1 2 3 4 5 
> x
a b c d e 
1 2 3 4 5  
> y <- 1L
> tracemem(y)
[1] "<0x12587ed68>"
> "names<-"(y,letters[1])
tracemem[0x12587ed68 -> 0x12587efa8]: 
a 
1
> y
[1] 1 

when trying to help someone figure out why in the former case the vector's names are being modified but in the latter they are not.

Clearly, the length one vector is being copied, while the length 5 vector is being modified in place:

> x <- 1:5
> y <- 1L
> .Internal(inspect(x))
@121467490 13 INTSXP g0c3 [MARK,NAM(1)] (len=5, tl=0) 1,2,3,4,5
> .Internal(inspect(y))
@1258d74d8 13 INTSXP g0c1 [NAM(2)] (len=1, tl=0) 1

Why does the length one vector start out its existence with its NAMED property incremented to 2?

In response to @nograpes comment below, I'm seeing this on OS X 10.7.5 and R 3.0.2.

like image 877
joran Avatar asked Feb 25 '14 18:02

joran


1 Answers

Matthew Dowle asked the same question here, and Peter Dalgaard answered thusly:

This is tricky business... I'm not quite sure I'll get it right, but let's try

When you are assigning a constant, the value you assign is already part of the assignment expression, so if you want to modify it, you must duplicate. So NAMED==2 on z <- 1 is basically to prevent you from accidentally "changing the value of 1". If it weren't, then you could get bitten by code like for(i in 1:2) {z <- 1; if(i==1) z[1] <- 2}.

This may seem exotic, but really, the rationale is exactly the same as it is for incrementing NAM to 2 whenever doing an assignment of the form x <- y.

As discussed here, R supports a "call by value" illusion to avoid at least some unnecessary copying of objects. So, for instance, x <- y really just binds the symbol x to y's value. The danger of doing that without further precautions, though, is that subsequent modification of x would also modify y and any other symbols linked to y. R gets around this by marking y's value as "linked to" (by setting it's NAM=2) as soon as it is assigned (or even potentially assigned) to another symbol.

When you do x <- 1, the 1 is more or less just another y whose value is being linked to the symbol x by the assignment expression. It's just that the potential for mischief arising from subsequent modification of x's value (recalling that at this point, it's just a reference to the value of 1!) is awful to imagine. But, as always with assignments of one symbol to another, R sets NAM=2, and no modifications without actual copying are allowed.

The reason x <- 1:10 is different (as are x <- 1:1, x <- c(1), x <- seq(1), and even x <- -1) is that the RHS is actually a function call, and the result of that function call is what's being assigned to x. In these cases, the value of x is not just a reference to the value of some other symbol; modifying x won't potentially change the value of some other symbol, so there is no need to set NAM=2.

like image 103
Josh O'Brien Avatar answered Oct 15 '22 10:10

Josh O'Brien