Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly needs to be PROTECTed when writing C functions for use in R

I thought this was pretty straightforward, basically, any SEXP type objects I create in C code must be protected, but it starts getting a little murkier (to me) when using linked lists and CAR / CDR, etc. I started off with this comment in Writing R Extensions:

Protecting an R object automatically protects all the R objects pointed to in the corresponding SEXPREC, for example all elements of a protected list are automatically protected.

And this from R Internals:

A SEXPREC is a C structure containing the 32-bit header as described above, three pointers (to the attributes, previous and next node) and the node data ...

LISTSXP: Pointers to the CAR, CDR (usually a LISTSXP or NULL) and TAG (a SYMSXP or NULL).

So I interpret this to mean that, if I do something like:

SEXP s, t, u;
PROTECT(s = allocList(2));
SETCAR(s, ScalarLogical(1));
SETCADR(s, ScalarLogical(0));

t = CAR(s);
u = CADR(s);

Then t and u are protected by virtue of being pointers to objects that are within the protected list s (corollary question: is there a way to get the PROTECTED status of an object? Couldn't see anything in Rinternals.h that fit the bill). Yet I see stuff like (from src/main/unique.c):

// Starting on line 1274 (R 3.0.2), note `args` protected by virtue of being
// a function argument

SEXP attribute_hidden do_matchcall(SEXP call, SEXP op, SEXP args, SEXP env)
{
  // ommitting a bunch of lines, and then, on line 1347:
  
  PROTECT(b = CAR(args));

  // ...
}

This suggests all the objects within args are not protected, but that seems very odd since then any of the args objects could have gotten GCed at any point. Since CAR just returns a pointer to a presumably already protected object, why do we need to protect it here?

like image 657
BrodieG Avatar asked Oct 28 '14 13:10

BrodieG


1 Answers

Think about it this way: PROTECT doesn't actually do something to the object. Rather, it adds a temporary GC root so that the object is considered alive by the collector. Any objects it contains are also alive, not because of some protection applied from C, but because they are pointed-to by another object that is itself already considered alive - the same as any other normal live object. So setting the car of a protected list not only keeps that object alive, it also potentially releases whatever was previously in the car for GC, removing it from that particular live tree (protecting the list didn't recursively affect the elements).

So in general you aren't going to have an easy way of telling whether an object is "protected" or not in this wider sense, because it's actually just following the same rules as GC does elsewhere and there's nothing special about the object. You could potentially trace through the entire PROTECT list and see if you find it, but that would be... inefficient, to say the least (there's also nothing to say that the ownership tree leading to the object in question from the one on the PROTECT list is the one that will keep it alive for the longest).

The line in do_matchcall is actually there for a completely unrelated reason: protecting CAR(args) only happens in one branch of a conditional - in the other branch, it's a newly-created object that gets protected. Redundantly protecting the value from this branch as well means that there's guaranteed to be the same number of objects on the PROTECT stack regardless of which branch was taken, which simplifies the corresponding UNPROTECT at the end of the function to an operation on a constant number of slots (no need to replicate the check down there to vary it).

like image 73
Leushenko Avatar answered Oct 17 '22 04:10

Leushenko