Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

problem with sum function after inplace editing using Rcpp

Tags:

r

sum

rcpp

If modifying a value of an IntegerVector in Rcpp:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
void test(IntegerVector x) {
  x[5] = 77;
}

After running test() function in R :

x <- 10:1
test(x)
print(x)  #  10  9  8  7  6 77  4  3  2  1
sum(x)  # 55

The sum function return the value of the original array 10:1. How can I solve this problem?

There is no problem when using e.g. x <- sample(10L) instead.

like image 449
MACHERKI M E Avatar asked Feb 24 '19 21:02

MACHERKI M E


2 Answers

@F.Privé's suspicion is correct. This is an issue with ALTREP, which Rcpp does not support yet, c.f. Rcpp/#812 and Rcpp/#906. We can see this more explicitly by inspecting the variable x:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
void test(IntegerVector x) {
  x[5] = 77;
}

/*** R
x <- 10:1
.Internal(inspect(x))
test(x)
.Internal(inspect(x))
print(x)  #  10  9  8  7  6 77  4  3  2  1
sum(x)  # 55

x <- 10:1
.Internal(inspect(x))
x[6] <- 77L
.Internal(inspect(x))
print(x)  #  10  9  8  7  6 77  4  3  2  1
sum(x)
*/

The first block gives:

> x <- 10:1

> .Internal(inspect(x))
@55f79a9d6c58 13 INTSXP g0c0 [NAM(3)]  10 : 1 (compact)

> test(x)

> .Internal(inspect(x))
@55f79a9d6c58 13 INTSXP g0c0 [NAM(3)]  10 : 1 (expanded)

> print(x)  #  10  9  8  7  6 77  4  3  2  1
 [1] 10  9  8  7  6 77  4  3  2  1

> sum(x) # 55
[1] 55

While the second block gives:

> x <- 10:1

> .Internal(inspect(x))
@55f79b1f9018 13 INTSXP g0c0 [NAM(3)]  10 : 1 (compact)

> x[6] <- 77L

> .Internal(inspect(x))
@55f7a096e5e8 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 10,9,8,7,6,...

> print(x)  #  10  9  8  7  6 77  4  3  2  1
 [1] 10  9  8  7  6 77  4  3  2  1

> sum(x)
[1] 127

So after changing a value in the vector, it still claims to be 10 : 1, for which sum uses a short-cut. See here for further reading (including references) on ALTREP.

For now the only solution seems to be to refrain from altering the function argument.

like image 56
Ralf Stubner Avatar answered Sep 19 '22 11:09

Ralf Stubner


Here is what I posted elsewhere regarding this:

So there's a couple of things here, most of which @ltierney and/or @kalibera alluded to but which perhaps can benefit from more concrete possible coding patterns.

The crux of the matter here is inline modifying the payload of a SEXP by writing to elements of its DATAPTR-addressed memory. Is it ever ok to do (yes), can it be safe to do without confirming/being guaranteed its ok to do (no, never). And for objects created in the same chunk of C/C++ code, you may have such a priori guarantees, but you're not going to for objects passed down from R.

To be concrete, I don't know of any times it would be ok to write to the pointer returned by INTEGER() on a SEXP that lives in R-space without first checking MAYBE_SHARED(). If MAYBE_SHARED(x) returns FALSE, then it's ok, and you can proceed writing to the pointer, exactly as your code did.

If MAYBE_SHARED(x) == TRUE, then you need to duplicate, operate on the copy, and then return that. When you're in C/C++ code, at the level of grabbing data pointers to things, then you, your code needs to explicitly cause that duplication, protect the new duplicated result, etc.

Now the reason that this particular thing is happening is in the compact sequence case is that, unless R itself is built a specific non-default way, compact sequences ALWAYS have NAMED(x) == MAXNAMED (ie 2), from the point of creation. As Luke pointed out, this could change, but is currently by design. So even in a .Call situation where the closure wasn't forcing the NAMED count up, compact sequences will always need duplication before inline modification. And while this is a choice we could have made different in the compact sequence case, Luke's point about other ALTREPs is more important.

There may be ALTREP SEXPs where the memory pointed to by the pointer returned by, e.g., INTEGER is literally not writable memory for one reason or another. The way that those ALTREP classes will declare that is by doing the same thing the compact sequences do, by marking themselves as "IMMUTABLE", ie by doing MARK_NOT_MUTABLE(x) (which currently sets NAMED to MAXNAMED, but which is future-proof against eventual change to reference counting). This declares the contract that the SEXP must be duplicated before any code that grabs the datapointer and writes to it.

Ultimately, I agree this is a really weird unexpected behavior, but its due to a failure to meet the contract which has always been there. It may have been safe(ish, and I still doubt it) to ignore/be lax about in certain cases in the past, but with the advent of ALTREP it now must always be followed for the reasons outlined in this thread.

So all all all code thats going to grab a dataptr from a pre-existing (R-level) SEXP and write to it must follow a pattern along the lines of (or equivalently careful to):

SEXP awesomefun(SEXP x) 
{
    int nprot = 0;
    if(MAYBE_SHARED(x)) {
        PROTECT(x = duplicate(x)); nprot++; 
    }
    /* do awesome things to x that modify it inline
        protect other things as necessary but always increment nprot when you do, 
        decrement nprot if you ever unprotect */
    if(nprot) UNPROTECT(nprot);
    return x;
}

Any code which writes to datapointers retrieved from SEXPs it didn't itself create (ie anything that comes down from the R side) without doing this was already violating the C-API contract but is now also fatally ALTREP unsafe, as the motivating example showed.

And, one more time, remember, compact sequences could behave differently so that that code happened to work, but other ALTREP classes (Luke mentions memory-mapped-file backed ALTREPS) couldn't, so the behavior of compact sequences isn't actually the issue here.

I hope that is helpful and makes things clearer.

Best

like image 42
Gabe Becker Avatar answered Sep 19 '22 11:09

Gabe Becker