Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R, deep vs. shallow copies, pass by reference

I would like to understand the logic R uses when passing arguments to functions, creating copies of variables, etc. with respect to the memory usage. When does it actually create a copy of the variable vs. just passing a reference to that variable? In particular the situations I am curious about are:

f <- function(x) {x+1}
a <- 1
f(a)

Is a being passed literally or is a reference to a being passed?

x <- 1
y <- x

Reference of copy? When is this not the case?

If someone could explain this to me I would highly appreciate.

like image 874
Alex Avatar asked May 18 '12 15:05

Alex


People also ask

Is shallow copy pass by reference?

No. Those two things are completely unrelated. Shallow copy/deep copy is talking about object copying; whereas pass-by-value/pass-by-reference is talking about the passing of variables.

Which is better deep copy or shallow copy?

Shallow Copy stores the copy of the original object and points the references to the objects. Deep copy stores the copy of the original object and recursively copies the objects as well. Shallow copy is faster. Deep copy is comparatively slower.

When would you use a shallow copy over a deep copy?

A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original. A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

Why would you want a shallow copy?

Shallow copies are useful when you want to make copies of classes that share one large underlying data structure or set of data.


2 Answers

When it passes variables, it is always by copy rather than by reference. Sometimes, however, you will not get a copy made until an assignment actually occurs. The real description of the process is pass-by-promise. Take a look at the documentation

?force
?delayedAssign

One practical implication is that it is very difficult if not impossible to avoid needing at least twice as much RAM as your objects nominally occupy. Modifying a large object will generally require making a temporary copy.

update: 2015: I do (and did) agree with Matt Dowle that his data.table package provides an alternate route to assignment that avoids the copy-duplication problem. If that was the update requested, then I didn't understand it at the time the suggestion was made.

There was a recent change in R 3.2.1 in the evaluation rules for apply and Reduce. It was SO-announced with reference to the News here: Returning anonymous functions from lapply - what is going wrong?

And the interesting paper cited by jhetzel in the comments is now here:

like image 58
IRTFM Avatar answered Sep 20 '22 02:09

IRTFM


Late answer but a very important aspect of the language design that don't get enough coverage on the web (or at least the usual sources).

x <- c(0,4,2)
lobstr::obj_addr(x)
# [1] "0x7ff25e82b0f8"
y <- x
lobstr::obj_addr(y)
# [1] "0x7ff25e82b0f8"

Notice the identical "memory address", i.e. the location in memory where the object is stored. You can thus confirm that x and y both point to the same identifier.

Hadley Wickham's Advanced R book touch on this:

Consider this code:

x <- c(1, 2, 3)

It’s easy to read it as: “create an object named ‘x’, containing the values 1, 2, and 3”. Unfortunately, that’s a simplification that will lead to inaccurate predictions about what R is actually doing behind the scenes. It’s more accurate to say that this code is doing two things:

It’s creating an object, a vector of values, c(1, 2, 3). And it’s binding that object to a name, x. In other words, the object, or value, doesn’t have a name; it’s actually the name that has a value.

Note that they are the memory addresses are ephemeral and change with every new R session.

Now here is the important part.

In R semantics, objects are copied by value. This means that modifying the copy leaves the original object intact. Since copying data in memory is an expensive operation, copies in R are as lazy as possible. They only happen when the new object is actually modified. Source: [R lang documentation][1]

So if we now modify the value of y by appending a value to the vector, y now points to a different "object". This agrees with what the documentation says regarding a copy operation happening "only when the new object is modified" (lazy). y is pointing to a different address than it was previously.

y <- c(y, -3)
print(lobstr::obj_addr(y))
# [1] "0x7ff25e825b48"
like image 32
onlyphantom Avatar answered Sep 23 '22 02:09

onlyphantom