I would like to understand the logic R uses when passing arguments to functions, creating copies of variables, etc. with respect to the memory usage. When does it actually create a copy of the variable vs. just passing a reference to that variable? In particular the situations I am curious about are:
f <- function(x) {x+1}
a <- 1
f(a)
Is a
being passed literally or is a reference to a being passed?
x <- 1
y <- x
Reference of copy? When is this not the case?
If someone could explain this to me I would highly appreciate.
No. Those two things are completely unrelated. Shallow copy/deep copy is talking about object copying; whereas pass-by-value/pass-by-reference is talking about the passing of variables.
Shallow Copy stores the copy of the original object and points the references to the objects. Deep copy stores the copy of the original object and recursively copies the objects as well. Shallow copy is faster. Deep copy is comparatively slower.
A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original. A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
Shallow copies are useful when you want to make copies of classes that share one large underlying data structure or set of data.
When it passes variables, it is always by copy rather than by reference. Sometimes, however, you will not get a copy made until an assignment actually occurs. The real description of the process is pass-by-promise. Take a look at the documentation
?force
?delayedAssign
One practical implication is that it is very difficult if not impossible to avoid needing at least twice as much RAM as your objects nominally occupy. Modifying a large object will generally require making a temporary copy.
update: 2015: I do (and did) agree with Matt Dowle that his data.table package provides an alternate route to assignment that avoids the copy-duplication problem. If that was the update requested, then I didn't understand it at the time the suggestion was made.
There was a recent change in R 3.2.1 in the evaluation rules for apply
and Reduce
. It was SO-announced with reference to the News here: Returning anonymous functions from lapply - what is going wrong?
And the interesting paper cited by jhetzel in the comments is now here:
Late answer but a very important aspect of the language design that don't get enough coverage on the web (or at least the usual sources).
x <- c(0,4,2)
lobstr::obj_addr(x)
# [1] "0x7ff25e82b0f8"
y <- x
lobstr::obj_addr(y)
# [1] "0x7ff25e82b0f8"
Notice the identical "memory address", i.e. the location in memory where the object is stored. You can thus confirm that x
and y
both point to the same identifier.
Hadley Wickham's Advanced R book touch on this:
Consider this code:
x <- c(1, 2, 3)
It’s easy to read it as: “create an object named ‘x’, containing the values 1, 2, and 3”. Unfortunately, that’s a simplification that will lead to inaccurate predictions about what R is actually doing behind the scenes. It’s more accurate to say that this code is doing two things:
It’s creating an object, a vector of values,
c(1, 2, 3)
. And it’s binding that object to a name,x
. In other words, the object, or value, doesn’t have a name; it’s actually the name that has a value.
Note that they are the memory addresses are ephemeral and change with every new R session.
Now here is the important part.
In R semantics, objects are copied by value. This means that modifying the copy leaves the original object intact. Since copying data in memory is an expensive operation, copies in R are as lazy as possible. They only happen when the new object is actually modified. Source: [R lang documentation][1]
So if we now modify the value of y
by appending a value to the vector, y
now points to a different "object". This agrees with what the documentation says regarding a copy operation happening "only when the new object is modified" (lazy). y
is pointing to a different address than it was previously.
y <- c(y, -3)
print(lobstr::obj_addr(y))
# [1] "0x7ff25e825b48"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With