Ben Bolkers' answer to this question and the article by Uwe Ligges are already very useful
when I try to "decode" a primitive or internal R function.
But how is a primitive R function connected with its corresponding C function?
I guess that somehow .Primitive
must provide this missing link.
Take for example is.na
:
> is.na
function (x) .Primitive("is.na")
FUNTAB R_FunTab[]
in file "names.c" contains
{"is.na", do_isna, 0, 1, 1, {PP_FUNCALL, PREC_FN, 0}},
which means that is.na
uses the C function do_isna
.
do_isna
is defined in file "coerce.c":
SEXP attribute_hidden do_isna(SEXP call, SEXP op, SEXP args, SEXP rho)
{
SEXP ans, dims, names, x;
R_xlen_t i, n;
checkArity(op, args);
check1arg(args, call, "x");
if (DispatchOrEval(call, op, "is.na", args, rho, &ans, 1, 1))
return(ans);
PROTECT(args = ans);
#ifdef stringent_is
if (!isList(CAR(args)) && !isVector(CAR(args)))
errorcall_return(call, "is.na " R_MSG_list_vec);
#endif
x = CAR(args);
n = xlength(x);
PROTECT(ans = allocVector(LGLSXP, n));
if (isVector(x)) {
PROTECT(dims = getAttrib(x, R_DimSymbol));
if (isArray(x))
PROTECT(names = getAttrib(x, R_DimNamesSymbol));
else
PROTECT(names = getAttrib(x, R_NamesSymbol));
}
else dims = names = R_NilValue;
switch (TYPEOF(x)) {
case LGLSXP:
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = (LOGICAL(x)[i] == NA_LOGICAL);
break;
case INTSXP:
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = (INTEGER(x)[i] == NA_INTEGER);
break;
case REALSXP:
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = ISNAN(REAL(x)[i]);
break;
case CPLXSXP:
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = (ISNAN(COMPLEX(x)[i].r) ||
ISNAN(COMPLEX(x)[i].i));
break;
case STRSXP:
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = (STRING_ELT(x, i) == NA_STRING);
break;
/* Same code for LISTSXP and VECSXP : */
#define LIST_VEC_NA(s) \
if (!isVector(s) || length(s) != 1) \
LOGICAL(ans)[i] = 0; \
else { \
switch (TYPEOF(s)) { \
case LGLSXP: \
case INTSXP: \
LOGICAL(ans)[i] = (INTEGER(s)[0] == NA_INTEGER); \
break; \
case REALSXP: \
LOGICAL(ans)[i] = ISNAN(REAL(s)[0]); \
break; \
case STRSXP: \
LOGICAL(ans)[i] = (STRING_ELT(s, 0) == NA_STRING); \
break; \
case CPLXSXP: \
LOGICAL(ans)[i] = (ISNAN(COMPLEX(s)[0].r) \
|| ISNAN(COMPLEX(s)[0].i)); \
break; \
default: \
LOGICAL(ans)[i] = 0; \
} \
}
case LISTSXP:
for (i = 0; i < n; i++) {
LIST_VEC_NA(CAR(x));
x = CDR(x);
}
break;
case VECSXP:
for (i = 0; i < n; i++) {
SEXP s = VECTOR_ELT(x, i);
LIST_VEC_NA(s);
}
break;
case RAWSXP:
/* no such thing as a raw NA */
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = 0;
break;
default:
warningcall(call, _("%s() applied to non-(list or vector) of type '%s'"),
"is.na", type2char(TYPEOF(x)));
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = 0;
}
if (dims != R_NilValue)
setAttrib(ans, R_DimSymbol, dims);
if (names != R_NilValue) {
if (isArray(x))
setAttrib(ans, R_DimNamesSymbol, names);
else
setAttrib(ans, R_NamesSymbol, names);
}
if (isVector(x))
UNPROTECT(2);
UNPROTECT(1);
UNPROTECT(1); /*ans*/
return ans;
}
But if we want to evaluate is.na(x=3)
for example, how are the arguments
call
,op
,args
,rho
generated?
At least some external information must be used, x=3
is not enough.
Moreover, at first glance x=3
is not used at all, which must be wrong of course:
> is.na
function (x) .Primitive("is.na")
The R Code of .Primitive
doesn't give a hint:
> .Primitive
function (name) .Primitive(".Primitive")
Taking all this into account, it is not surprising that an apparently excellent copy isNA
of is.na
fails:
> isNA <- function (x) .Primitive("is.na")
> isNA
function (x) .Primitive("is.na")
> is.na
function (x) .Primitive("is.na")
> isNA(x=3)
function (x) .Primitive("is.na")
> is.na(x=3)
[1] FALSE
To put it straight:
All of the C functions do_...
have these arguments
call
,op
,args
,rho
.
By what formula are they calculated when a primitive R function is called?
Great question. I started R under gdb R -d gdb
, set a breakpoint at do_isna
, then continued R and entered is.na(3)
.
$ R -d gdb
(gdb) run
Starting program: /home/mtmorgan/bin/R-3-3-branch/bin/exec/R --no-save --no-restore --silent
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> ## break, cntrl-C
Program received signal SIGINT, Interrupt.
0x00007ffff722fd83 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) b do_isna
Breakpoint 1 at 0x7ffff77e0b3b: file /home/mtmorgan/src/R-3-3-branch/src/main/coerce.c, line 1982.
(gdb) continue
Continuing.
> is.na(3)
Breakpoint 1, do_isna (call=0x1838888, op=0x628218, args=0x1838770, rho=0x63f648)
at /home/mtmorgan/src/R-3-3-branch/src/main/coerce.c:1982
1982 checkArity(op, args);
(gdb)
At the gdb prompt I asked
(gdb) where
#0 do_isna (call=0x1838888, op=0x628218, args=0x1838770, rho=0x63f648) at /home/mtmorgan/src/R-3-3-branch/src/main/coerce.c:1982
#1 0x00007ffff7869170 in Rf_eval (e=0x1838888, rho=0x63f648) at /home/mtmorgan/src/R-3-3-branch/src/main/eval.c:717
#2 0x00007ffff78b36af in Rf_ReplIteration (rho=0x63f648, savestack=0, browselevel=0, state=0x7fffffffcaf0) at /home/mtmorgan/src/R-3-3-branch/src/main/main.c:258
...
Starting from #2, Rf_ReplIteration is the REPL (read-eval-print loop) trying to evalute is.na(3)
. It's provided with the environment where the function is being called from. By the time it calls Rf_eval()
on line 258, it knows the environment and the call
(gdb) call Rf_PrintValue(rho)
<environment: R_GlobalEnv>
(gdb) call Rf_PrintValue(thisExpr)
is.na(3)
By #1 (eval.c:717), R has figured out the values of op
and tmp
.
(gdb) call Rf_PrintValue(op)
function (x) .Primitive("is.na")
(gdb) call TYPEOF(op)
$2 = 8
(type 8 is 'BUILTINSXP', from the table in Rinternals.h). It does this by finding out that e
is a LANGSXP (line 614), that is.na is a SYMSXP (line 670), and that the function it references (op
) is a BUILTINSXP (line 700). It then uses (line 717)
(gdb) call PRIMFUN(op)
$8 = (SEXP (*)(SEXP, SEXP, SEXP, SEXP)) 0x7ffff77e0b20 <do_isna>
to discover that it should invoke do_isna
with the values it's discovered.
Hopefully that removes some of the mystery, and points to relevant parts of the code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With