Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rewriting R's cummin() function using Rcpp and allowing for NAs

Tags:

r

rcpp

I'm learning Rcpp. In this example, I'm attempting to roll my own cummin() function like base R's cummin(), except I'd like my version to have a na.rm argument. This is my attempt

cummin.cpp

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector cummin_cpp(NumericVector x, bool narm = false){
  // Given a numeric vector x, returns a vector of the 
  // same length representing the cumulative minimum value
  // if narm = true, NAs will be ignored (The result may 
  // contain NAs if the first values of x are NA.)
  // if narm = false, the resulting vector will return the 
  // cumulative min until the 1st NA value is encountered
  // at which point all subsequent entries will be NA

  if(narm){
    // Ignore NAs
    for(int i = 1; i < x.size(); i++){
      if(NumericVector::is_na(x[i]) | (x[i-1] < x[i])) x[i] = x[i-1];
    }
  } else{
    // Don't ignore NAs
    for(int i = 1; i < x.size(); i++){
      if(NumericVector::is_na(x[i-1]) | NumericVector::is_na(x[i])){
        x[i] = NA_REAL;
      } else if(x[i-1] < x[i]){
        x[i] = x[i-1];
      }
    }
  }

  return x;
}

foo.R

library(Rcpp)
sourceCpp("cummin.cpp")

x <- c(3L, 1L, 2L)
cummin(x)  # 3 1 1
cummin_cpp(x)  # 3 1 1

class(cummin(x))  # integer
class(cummin_cpp(x))  # numeric

I have a few questions..

  1. R's standard variable name is na.rm, not narm as I've done. However, it seems I can't use a dot in the c++ variable name. Is there a way around this so I can be consistent with R's convention?
  2. I don't know ahead of time if the user's input is going to be a numeric vector or an integer vector, so I've used Rcpp's NumericVector type. Unfortunately, if the input is integer, the output is cast to numeric unlike base R's cummin() behavior. How do people usually deal with this issue?
  3. The line if(NumericVector::is_na(x[i]) | (x[i-1] < x[i])) x[i] = x[i-1]; seems silly, but I don't know a better way to do this. Suggestions here?
like image 676
Ben Avatar asked Aug 24 '18 22:08

Ben


1 Answers

I would use this:

template<typename T, int RTYPE>
Vector<RTYPE> cummin_cpp2(Vector<RTYPE> x, bool narm){

  Vector<RTYPE> res = clone(x);
  int i = 1, n = res.size();
  T na;

  if(narm){
    // Ignore NAs
    for(; i < n; i++){
      if(ISNAN(res[i]) || (res[i-1] < res[i])) res[i] = res[i-1];
    }
  } else{
    // Do not ignore NAs
    for(; i < n; i++){
      if(ISNAN(res[i-1])) {
        na = res[i-1];
        break;
      } else if(res[i-1] < res[i]){
        res[i] = res[i-1];
      }
    }
    for(; i < n; i++){
      res[i] = na;
    }
  }

  return res;
}


// [[Rcpp::export]]
SEXP cummin_cpp2(SEXP x, bool narm = false) {
  switch (TYPEOF(x)) {
  case INTSXP:  return cummin_cpp2<int, INTSXP>(x, narm);
  case REALSXP: return cummin_cpp2<double, REALSXP>(x, narm);
  default: Rcpp::stop("SEXP Type Not Supported."); 
  }
}

Try this on:

x <- c(NA, 7, 5, 4, NA, 2, 4)
x2 <- as.integer(x)

cummin_cpp(x, narm = TRUE)
x

cummin_cpp(x2)
x2


x <- c(NA, 7, 5, 4, NA, 2, 4)
x2 <- as.integer(x)
x3 <- replace(x, is.na(x), NaN)

cummin_cpp2(x, narm = TRUE)
x

cummin_cpp2(x2)
x2

cummin_cpp2(x3)
x3

Explanation:

  1. Joran's advice is good, just wrap that in an R function
  2. I use a dispatcher as Joseph Wood suggested
  3. Beware that x is passed by reference and is modified if of the same type of what you declared (see these 2 slides)
  4. You need to handle NA as well as NaN
  5. You can use || instead of | to evaluate only the first condition if it is true.
like image 171
F. Privé Avatar answered Sep 23 '22 18:09

F. Privé