Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding how .Internal C functions are handled in R

I wonder if anyone can illustrate to me how R executes a C call from an R command typed at the console prompt. I am particularly confused by R's treatment of a) function arguments and b) the function call itself.

Let's take an example, in this case set.seed(). Wondering how it works I type the name in at the prompt, get the source (look here for more on that), see there is eventually a .Internal(set.seed(seed, i.knd, normal.kind), so dutifully look up the relevant function name in the .Internals section of /src/names.c, find it is called do_setseed and is in RNG.c which leads me to...

SEXP attribute_hidden do_setseed (SEXP call, SEXP op, SEXP args, SEXP env) {     SEXP skind, nkind;     int seed;      checkArity(op, args);     if(!isNull(CAR(args))) {     seed = asInteger(CAR(args));     if (seed == NA_INTEGER)         error(_("supplied seed is not a valid integer"));     } else seed = TimeToSeed();     skind = CADR(args);     nkind = CADDR(args);     //...       //DO RNG here      //...     return R_NilValue; } 
  • What are CAR, CADR, CADDR? My research leads me to believe they are a Lisp influenced construct concerning lists but beyond that I do not understand what these functions do or why they are needed.
  • What does checkArity() do?
  • SEXP args seems self explanatory, but is this a list of the arguments that is passed in the function call?
  • What does SEXP op represent? I take this to mean operator (like in binary functions such as +), but then what is the SEXP call for?

Is anyone able to flow through what happens when I type

set.seed(1) 

at the R console prompt, up to the point at which skind and nkind are defined? I find I am not able to well understand the source code at this level and path from interpreter to C function.

like image 825
Simon O'Hanlon Avatar asked Oct 29 '13 16:10

Simon O'Hanlon


2 Answers

CAR and CDR are how you access pairlist objects, as explained in section 2.1.11 of R Language Definition. CAR contains the first element, and CDR contains the remaining elements. An example is given in section 5.10.2 of Writing R Extensions:

#include <R.h> #include <Rinternals.h>  SEXP convolveE(SEXP args) {     int i, j, na, nb, nab;     double *xa, *xb, *xab;     SEXP a, b, ab;      a = PROTECT(coerceVector(CADR(args), REALSXP));     b = PROTECT(coerceVector(CADDR(args), REALSXP));     ... } /* The macros: */ first = CADR(args); second = CADDR(args); third = CADDDR(args); fourth = CAD4R(args); /* provide convenient ways to access the first four arguments.  * More generally we can use the CDR and CAR macros as in: */ args = CDR(args); a = CAR(args); args = CDR(args); b = CAR(args); 

There's also a TAG macro to access the names given to the actual arguments.

checkArity ensures that the number of arguments passed to the function is correct. args are the actual arguments passed to the function. op is offset pointer "used for C functions that deal with more than one R function" (quoted from src/main/names.c, which also contains the table showing the offset and arity for each function).

For example, do_colsum handles col/rowSums and col/rowMeans.

/* Table of  .Internal(.) and .Primitive(.)  R functions  * =====     =========        ==========  * Each entry is a line with  *  *  printname  c-entry     offset  eval  arity   pp-kind   precedence  rightassoc  *  ---------  -------     ------  ----  -----   -------   ----------  ---------- {"colSums",    do_colsum,  0,      11,   4,     {PP_FUNCALL, PREC_FN,  0}}, {"colMeans",   do_colsum,  1,      11,   4,     {PP_FUNCALL, PREC_FN,  0}}, {"rowSums",    do_colsum,  2,      11,   4,     {PP_FUNCALL, PREC_FN,  0}}, {"rowMeans",   do_colsum,  3,      11,   4,     {PP_FUNCALL, PREC_FN,  0}}, 

Note that arity in the above table is 4 because (even though rowSums et al only have 3 arguments) do_colsum has 4, which you can see from the .Internal call in rowSums:

> rowSums function (x, na.rm = FALSE, dims = 1L)  {     if (is.data.frame(x))          x <- as.matrix(x)     if (!is.array(x) || length(dn <- dim(x)) < 2L)          stop("'x' must be an array of at least two dimensions")     if (dims < 1L || dims > length(dn) - 1L)          stop("invalid 'dims'")     p <- prod(dn[-(1L:dims)])     dn <- dn[1L:dims]     z <- if (is.complex(x))          .Internal(rowSums(Re(x), prod(dn), p, na.rm)) + (0+1i) *              .Internal(rowSums(Im(x), prod(dn), p, na.rm))     else .Internal(rowSums(x, prod(dn), p, na.rm))     if (length(dn) > 1L) {         dim(z) <- dn         dimnames(z) <- dimnames(x)[1L:dims]     }     else names(z) <- dimnames(x)[[1L]]     z } 
like image 196
Joshua Ulrich Avatar answered Oct 13 '22 19:10

Joshua Ulrich


The basic C-level pairlist extraction functions are CAR and CDR. (Pairlists are very similar to lists but are implemented as a linked-list and are used internally for argument lists). They have simple R equivalents: x[[1]] and x[-1]. R also provides lots of combinations of the two:

  • CAAR(x) = CAR(CAR(x)) which is equivalent to x[[1]][[1]]
  • CADR(x) = CAR(CDR(x)) which is equivalent to x[-1][[1]], i.e. x[[2]]
  • CADDR(x) = CAR(CDR(CDR(x)) is equivalent to x[-1][-1][[1]], i.e. x[[3]]
  • and so on

Accessing the nth element of a pairlist is an O(n) operation, unlike accessing the nth element of a list which is O(1). This is why there aren't nicer functions for accessing the nth element of a pairlist.

Internal/primitive functions don't do matching by name, they only use positional matching, which is why they can use this simple system for extracting the arguments.

Next you need to understand what the arguments to the C function are. I'm not sure where these are documented, so I might not be completely right about the structure, but I should be the general pieces:

  • call: the complete call, as might be captured by match.call()

  • op: the index of the .Internal function called from R. This is needed because there is a many-to-1 mapping from .Internal functions to C functions. (e.g. do_summary implements sum, mean, min, max and prod). The number is the third entry in names.c - it's always 0 for do_setseed and hence never used

  • args: a pair list of the arguments supplied to the function.

  • env: the environment from which the function was called.

checkArity is a macro which calls Rf_checkArityCall, which basically looks up the number of arguments (the fifth column in names.c is arity) and make sure the supplied number matches. You have to follow through quite a few macros and functions in C to see what's going on - it's very helpful to have a local copy of R-source that you can grep through.

like image 42
hadley Avatar answered Oct 13 '22 19:10

hadley