Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do two references to the same vector return different memory addresses for each element of the vector?

Tags:

r

I'm learning R and currently I'm reading this book. To make sure I understand the concept, I ran the following test which turned out to be quite confusing for me and I'd appreciate if you could clarify it. Here is the test, which I ran directly in the R shell from the terminal (not using RStudio or Emacs ESS).

> library(lobstr)
>
> x <- c(1500,2400,8800)
> y <- x
> ### So the following two lines must return the same memory address
> obj_addr(x)
[1] "0xb23bc50"
> obj_addr(y)
[1] "0xb23bc50"
> ### So as I expected, indeed both x and y point to the same memory 
> ### location: 0xb23bc50
>
>
>
> ### Now let's check that each element can be referenced by the same
> ### memory address either by using x or y
> x[1]
[1] 1500
> y[1]
[1] 1500
> obj_addr(x[1])
[1] "0xc194858"
> obj_addr(y[1])
[1] "0xc17db88"
> ### And here is exactly what I don't understand: x and y point 
> ### to the same memory address, so the same must be true for 
> ### x[1] and y[1]. So how come I obtain two different memory
> ### addresses for the same element of the same vector?
>
>
>
> x[2]
[1] 2400
> y[2]
[1] 2400
> obj_addr(x[2])
[1] "0xc15eca0"
> obj_addr(y[2])
[1] "0xc145d30"
> ### Same problem!
>
>
>
> x[3]
[1] 8800
> y[3]
[1] 8800
> obj_addr(x[3])
[1] "0xc10e9b0"
> obj_addr(y[3])
[1] "0xc0f78e8"
> ### Again the same problem: different memory addresses

Could you tell me where my mistake is and what I've misunderstood in this problem?

like image 241
user17911 Avatar asked Apr 07 '20 12:04

user17911


2 Answers

Any R object is a C (pointer -called SEXP- to a) "multi-object" (struct). This includes information (that R needs to operate, e.g. length, number of references -to know when to copy an object- and more) about the R object and, also, the actual data of the R object that we have access to.

lobstr::obj_addr, presumably, returns the memory address that a SEXP points to. That part of the memory contains both the information about and the data of the R object. From within the R environment we can't/don't need to access the (pointer to the) memory of the actual data in each R object.

As Adam notes in his answer, the function [ copies the nth element of the data contained in the C object to a new C object and returns its SEXP pointer to R. Each time [ is called, a new C object is created and returned to R.

We can't access the memory address of each element of the actual data of our object through R. But playing a bit around, we can trace the respective addresses using the C api:

A function to get the addresses:

ff = inline::cfunction(sig = c(x = "integer"), body = '
             Rprintf("SEXP @ %p\\n", x);

             Rprintf("first element of SEXP actual data @ %p\\n", INTEGER(x));

             for(int i = 0; i < LENGTH(x); i++) 
                 Rprintf("<%d> @ %p\\n", INTEGER(x)[i], INTEGER(x) + i);

             return(R_NilValue);
     ')

And applying to our data:

x = c(1500L, 2400L, 8800L)  #converted to "integer" for convenience
y = x

lobstr::obj_addr(x)
#[1] "0x1d1c0598"
lobstr::obj_addr(y)
#[1] "0x1d1c0598"

ff(x)
#SEXP @ 0x1d1c0598
#first element of SEXP actual data @ 0x1d1c05c8
#<1500> @ 0x1d1c05c8
#<2400> @ 0x1d1c05cc
#<8800> @ 0x1d1c05d0
#NULL
ff(y)
#SEXP @ 0x1d1c0598
#first element of SEXP actual data @ 0x1d1c05c8
#<1500> @ 0x1d1c05c8
#<2400> @ 0x1d1c05cc
#<8800> @ 0x1d1c05d0
#NULL

The successive memory difference between our object's data elements equals the size of int type:

diff(c(strtoi("0x1d1c05c8", 16), 
       strtoi("0x1d1c05cc", 16), 
       strtoi("0x1d1c05d0", 16)))
#[1] 4 4

Using the [ function:

ff(x[1])
#SEXP @ 0x22998358
#first element of SEXP actual data @ 0x22998388
#<1500> @ 0x22998388
#NULL
ff(x[1])
#SEXP @ 0x22998438
#first element of SEXP actual data @ 0x22998468
#<1500> @ 0x22998468
#NULL

This might be a more than needed extensive answer and is simplistic on the actual technicalities, but, hopefully, offers a clearer "big" picture.

like image 53
alexis_laz Avatar answered Oct 25 '22 09:10

alexis_laz


This is one way to look at it. I am sure there is a more technical view. Remember that in R, nearly everything is a function. This includes the extract function, [. Here is an equivalent statement to x[1]:

> `[`(x, 1)
[1] 1500

So what you are doing is running a function which returns a value (check out ?Extract). That value is an integer. When you run obj_addr(x[1]), it is evaluating the function x[1] and then giving you the obj_addr() of that function return, not the address of the first element of the array that you bound to both x and y.

like image 45
Adam Avatar answered Oct 25 '22 07:10

Adam