Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Templated Rcpp function to erase NA values

Tags:

r

templates

rcpp

I would write a function (using Rcpp) that removes all the NA values from a R vector.

Before doing so, I did a little test function through Rcpp::cppFunction function.

library(inline)
cppFunction('
  Vector<INTSXP> na_test(const Vector<INTSXP>& x) {
    return setdiff(x, Vector<INTSXP>::create(::traits::get_na<INTSXP>()));
  }
')

That works this way:

na_test(c(1, NA, NA, 1, 2, NA))
# [1] 1 2

After that I tried to generalize this function through the C++ template mechanism.

So, in an external .cpp file (sourced through sourceCpp function), I've written:

template <int RTYPE>
Vector<RTYPE> na_test_template(const Vector<RTYPE>& x) {
    return setdiff(x, Vector<RTYPE>::create(::traits::get_na<RTYPE>()));
}

// [[Rcpp::export(na_test_cpp)]]
SEXP na_test(SEXP x) {
    switch(TYPEOF(x)) {
        case INTSXP:
            return na_test_template<INTSXP>(x);
        case REALSXP:
            return na_test_template<REALSXP>(x);
    }
    return R_NilValue;
}

This code compiles but behaves differently and I cannot explain why.

Infact:

na_test_cpp(c(1, NA, NA, 1, 2, NA))
# [1]  2 NA NA NA  1

Why the same function (apparently) behaves differently? What happens here?

like image 443
leodido Avatar asked Apr 11 '13 16:04

leodido


2 Answers

Following up on your answer, I would use something like this as the template:

template <int RTYPE>
Vector<RTYPE> na_omit_template(const Vector<RTYPE>& x) {
    int n = x.size() ;
    int n_out = n - sum( is_na(x) ) ;

    Vector<RTYPE> out(n_out) ;
    for( int i=0, j=0; i<n; i++){
        if( Vector<RTYPE>::is_na( x[i] ) ) continue ;
        out[j++] = x[i];
    }
    return out ;
}

So the idea is to first calculate the length of the result, and then just use Rcpp vector classes instead of std::vector. This will lead to less copies of the data.


With the development version of Rcpp (svn revision >= 4308), it works for me for all types, and we can then use our RCPP_RETURN_VECTOR dispatch macro instead of writing the switch:

// [[Rcpp::export]]
SEXP na_omit( SEXP x ){
     RCPP_RETURN_VECTOR( na_omit_template, x ) ;   
}

na_omit has been included in Rcpp(svn revision >= 4309), with a few modifications, i.e. it can handle named vectors and arbitrary sugar expressions.

like image 178
Romain Francois Avatar answered Oct 29 '22 10:10

Romain Francois


I've continued to investigate a solution about the templating problem (i.e. see @Sameer reply).

So I've written another function and now the templating mechanism works.

In an external .cpp file:

#include <Rcpp.h>
template <int RTYPE, class T>
Vector<RTYPE> na_omit_template(const Vector<RTYPE>& x) {
    typedef typename Vector<RTYPE>::iterator rvector_it;
    if (x.size() == 0) {
        return x;
    }
    std::vector<T> out;
    rvector_it it = x.begin();
    for (; it != x.end(); ++it) {
        if (!Vector<RTYPE>::is_na(*it)) {
            out.push_back(*it);
        }
    }
    return wrap(out);
}

// [[Rcpp::export(na_omit_cpp)]]
SEXP na_omit(SEXP x) {
    switch(TYPEOF(x)) {
        case INTSXP:
            return na_omit_template<INTSXP, int>(x);
        case REALSXP:                   
            return na_omit_template<REALSXP, double>(x);
        case LGLSXP:
            return na_omit_template<LGLSXP, bool>(x);
        case CPLXSXP:
            return na_omit_template<CPLXSXP, Rcomplex>(x);
        case RAWSXP:
            return na_omit_template<RAWSXP, Rbyte>(x);
        default:
            stop("unsupported data type");
    }
}

This function removes the NA values, that was my initial purpose.

Unfortunately, at the moment, it does not work for all types of vectors, as the R examples below show.

library(Rcpp)
sourceCpp('file.cpp')
na_omit_cpp(as.integer(c(1, NA, NA, 1, 2, NA))) # OK
# [1] 1 1 2
na_omit_cpp(as.numeric(c(1, NA, NA, 1, 2, NA)))
# [1] 1 1 2
na_omit_cpp(c(NA, 1L, NA, 3L, NA)) # OK
# [1] 1 3
na_omit_cpp(c(NA, 2L, 1, NA)) # OK
# [1] 2 1
na_omit_cpp(c(1.0, 1.1, 2.2, NA, 3, NA, 4)) # OK
# [1] 1.0 1.1 2.2 3.0 4.0
na_omit_cpp(c(1L, NaN, NaN, 0, NA)) # OK
# [1]   1 NaN NaN   0
na_omit_cpp(c(NA, NaN, 1.0, 0.0, 2.2, NA, 3.3, NA, 4.4)) # OK
# [1] NaN 1.0 0.0 2.2 3.3 4.4
na_omit_cpp(as.logical(c(1, 0, 1, NA))) # OK
# [1]  TRUE FALSE  TRUE
na_omit_cpp(as.logical(c(TRUE, FALSE, NA, TRUE, NA))) # OK
# [1]  TRUE FALSE  TRUE
# empty vectors ?
na_omit_cpp(c(NA)) # OK
# logical(0)
na_omit_cpp(numeric(0)) # OK
# numeric(0)
na_omit_cpp(logical(0)) # OK
# logical(0)
na_omit_cpp(raw(0)) # OK
# raw(0)
na_omit_cpp(as.raw(c(40,16,NA,0,2))) # NO! (R converts it to 00)
# [1] 28 10 00 00 02
# Warning message ...
na_omit_cpp(as.complex(c(-1, 2, 1, NA, 0, NA, -1))) # NO! 
# [1] -1+0i  2+0i  1+0i    NA  0+0i    NA -1+0i

So, this function works in almost all cases except for raw vectors and complex vectors.

The current open issues are:

  1. I'm not sure why this bug and I'd like to find out why. Any idea?
  2. The previous templated function, as showed @Sameer, have a strange behaviour.
  3. How to do to accept character vectors?

I thought clearly to case STRSXP: return na_omit_template<STRSXP, ?>(x); but this statement doesn't work substituting std::string, Rcpp:String to ?.

like image 31
leodido Avatar answered Oct 29 '22 09:10

leodido