Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why does as.vector deep copy a matrix?

Tags:

r

Using top, I manually measured the following memory usages at the specific points designated in the comments of the following code block:

x <- matrix(rnorm(1e9),nrow=1e4) 
#~15gb
gc()
# ~7gb after gc()
y <- as.vector(x)
gc()
#~15gb after gc()

It's pretty clear that rnorm(1e9) is a ~7gb vector that's then copied to create the matrix. gc() removes the original vector since it's not assigned to anything. as.vector(x) then coerces and copies the data to vector.

My question is, why can't these three objects all point to the same memory block (at least until one is modified)? Isn't a matrix really just a vector with some additional metadata?

This is in R version 3.6.2

edit: also tested in 4.0.3, same results.

like image 803
Michael Avatar asked Feb 03 '21 03:02

Michael


Video Answer


1 Answers

The question you're asking is to the reasoning. That seems more suited for R-devel, and I am assuming the answer in return is "no one knows". The relevant function from R-source is the do_asvector function.

Going down the source code of a call to as.vector(matrix(...)), it is important to note that the default argument for mode is any. This translates to ANYSXP (see R internals). This lets us find the evil culprit (line 1524) of the copy-behaviour.

// source reference: do_asvector
...
    if(type == ANYSXP || TYPEOF(x) == type) {
    switch(TYPEOF(x)) {
    case LGLSXP:
    case INTSXP:
    case REALSXP:
    case CPLXSXP:
    case STRSXP:
    case RAWSXP:
        if(ATTRIB(x) == R_NilValue) return x;
        ans  = MAYBE_REFERENCED(x) ? duplicate(x) : x; // <== evil culprit
        CLEAR_ATTRIB(ans);
        return ans;
    case EXPRSXP:
    case VECSXP:
        return x;
    default:
        ;
    }
...

Going one step further, we can find the definition for MAYBE_REFERENCED in src/include/Rinternals.h, and by digging a bit we can find that it checks whether sxpinfo.named is equal to 0 (false) or not (true). What I am guessing here is that the assignment operator <- increments the sxpinfo.named counter and thus MAYBE_REFERENCED(x) returns TRUE and we get a duplicate (deep copy).

However, Is this behaviour necessary?

That is a great question. If we had given an argument to mode other than any or class(x) (same as our input class), we skip the duplicate line, and we continue down the function, until we hit a ascommon. So I dug a bit extra and took a look at the source code for ascommon, we can see that if we were to try and convert to list manually (setting mode = "list"), ascommon only calls shallowDuplicate.

// Source reference: ascommon
---
    if ((type == LISTSXP) &&
        !(TYPEOF(u) == LANGSXP || TYPEOF(u) == LISTSXP ||
          TYPEOF(u) == EXPRSXP || TYPEOF(u) == VECSXP)) {
        if (MAYBE_REFERENCED(v)) v = shallow_duplicate(v); // <=== ascommon duplication behaviour
        CLEAR_ATTRIB(v);
    }
    return v;
    }
---

So one could imagine that the call to duplicate in do_asvector could be replaced by a call to shallow_duplicate. Perhaps a "better safe than sorry" strategy was chosen when the code was originally implemented (prior to R-2.13.0 according to a comment in the source code), or perhaps there is a scenario in one of the types not handled by ascommon that requires a deep-copy.

For now I would test if the function does a deep-copy if we set mode='list' or pass the list without assignment. In either case it might not be a bad idea to send a follow-up question to the R-devel mailing list.

Edit: <- behaviour

I took the liberty to confirm my suspicion, and looked at the source code for <-. I previously stated that I assumed that <- incremented sxpinfo.named, and we can confirm this by looking at do_set (the c source code for <-). When assigning as x <- ... x is a SYMSXP, and this we can see that the source code calls INCREMENT_NAMED which in turn calls SET_NAMED(x, NAMED(X) + 1). So everything else equal we should see a copy behaviour for x <- matrix(...); y <- as.vector(x) while we shouldn't for y <- as.vector(matrix(...)).

like image 89
Oliver Avatar answered Sep 27 '22 22:09

Oliver