Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are some memory addresses reported constant while others change?

I've been trying to keep track of various objects in memory using data.table::address or .Internal(address()), but have noticed that some objects return the same address every time, while others are almost always different. What is going on here?

I've noticed that addresses of objects like lists (data.tables, data.frames, etc) remain constant (as reported by these functions), whereas if I try to report the address by [ into a list, ie address(lst[1]) I get different results nearly everytime. On the other hand lst[[1]] returns the same value, and the addresses of constants like address(pi) remain constant whereas address(1) is volatile. Why is this happening?

## Create some data.tables of different sizes and plot the addresses
library(data.table)
par(mfrow = c(2,2))
for (i in 2:5) {
    dat <- data.table(a=1:10^i)
    ## Constants
    addr1 <- address(dat)
    addr2 <- address(dat[[1]])
    addr3 <- address(dat$a)  # same as addr2
    ## Vary
    addrs <- replicate(5000, address(dat[1]))
    plot(density(as.integer(as.hexmode(addrs))), main=sprintf("N: %g", nrow(dat)))
    abline(v=as.integer(as.hexmode(c(addr1, addr2, addr3))), col=1:3, lwd=2, lty=1:3)
    legend("topleft", c("dat", "dat[[1]]", "dat$a"), col=1:3, lwd=2, lty=1:3)
}

Here are some examples of what I'm talking about with different sized data.tables. They are just densities of the results from address(dat[1]) (converted to an integer), and the lines correspond to the constant addresses of the data.table.

enter image description here

like image 419
Rorschach Avatar asked Oct 18 '22 22:10

Rorschach


1 Answers

First off, I can replicate your results, so I did a bit of an investigation and dived through some code.

When you access the first member of dat using dat[1] you are actually creating a slice made from the list in data[[1]] or dat$a. To take a slice, R first copies the list and then returns the slice you want.

So - basically - you see what you see because the [] syntax for indexing returns a slice containing the first element of dat which is a copy of dat$a, which will be at an arbitrary memory location.

The [[]] syntax returns a reference to the actual list that is the column in your data.table or data.frame and hence its address is invariant (or at least it is until you change a member of that list).

This could be confusing, because of course doing dat[1] = 6 or similar will alter the value(s) of the list in your data structure. However, if you look at address(dat[[1]]) before and after making such a change, you will notice that in fact the reference is now to a different list (the copy) e.g.

> dat <- data.table(a=1:10000)
> dat
           a
    1:     1
    2:     2
    3:     3
    4:     4
    5:     5
   ---      
 9996:  9996
 9997:  9997
 9998:  9998
 9999:  9999
10000: 10000
> address(dat[[1]])
[1] "000000000CF389D8"
> address(dat[[1]])
[1] "000000000CF389D8"
> dat[1] = 100
> address(dat[[1]])
[1] "000000000D035B38"
> dat
           a
    1:   100
    2:     2
    3:     3
    4:     4
    5:     5
   ---      
 9996:  9996
 9997:  9997
 9998:  9998
 9999:  9999
10000: 10000
> 

Looking at the source code for data.frame (rather than data.table), the code that does the slice indexing ([]) is here, whereas the direct indexing ([[]]) is here. You can see that the latter is simpler and to cut a long story short, the former returns a copy. If you change a slice directly (e.g. dat[1] = 5), there is some logic here that handles ensuring that the data frame now references the updated copy.

like image 98
J Richard Snape Avatar answered Oct 22 '22 01:10

J Richard Snape