I would write a function (using Rcpp
) that removes all the NA
values from a R
vector.
Before doing so, I did a little test function through Rcpp::cppFunction
function.
library(inline)
cppFunction('
Vector<INTSXP> na_test(const Vector<INTSXP>& x) {
return setdiff(x, Vector<INTSXP>::create(::traits::get_na<INTSXP>()));
}
')
That works this way:
na_test(c(1, NA, NA, 1, 2, NA))
# [1] 1 2
After that I tried to generalize this function through the C++
template mechanism.
So, in an external .cpp file (sourced through sourceCpp
function), I've written:
template <int RTYPE>
Vector<RTYPE> na_test_template(const Vector<RTYPE>& x) {
return setdiff(x, Vector<RTYPE>::create(::traits::get_na<RTYPE>()));
}
// [[Rcpp::export(na_test_cpp)]]
SEXP na_test(SEXP x) {
switch(TYPEOF(x)) {
case INTSXP:
return na_test_template<INTSXP>(x);
case REALSXP:
return na_test_template<REALSXP>(x);
}
return R_NilValue;
}
This code compiles but behaves differently and I cannot explain why.
Infact:
na_test_cpp(c(1, NA, NA, 1, 2, NA))
# [1] 2 NA NA NA 1
Why the same function (apparently) behaves differently? What happens here?
Following up on your answer, I would use something like this as the template:
template <int RTYPE>
Vector<RTYPE> na_omit_template(const Vector<RTYPE>& x) {
int n = x.size() ;
int n_out = n - sum( is_na(x) ) ;
Vector<RTYPE> out(n_out) ;
for( int i=0, j=0; i<n; i++){
if( Vector<RTYPE>::is_na( x[i] ) ) continue ;
out[j++] = x[i];
}
return out ;
}
So the idea is to first calculate the length of the result, and then just use Rcpp
vector classes instead of std::vector
. This will lead to less copies of the data.
With the development version of Rcpp
(svn revision >= 4308), it works for me for all types, and we can then use our RCPP_RETURN_VECTOR
dispatch macro instead of writing the switch
:
// [[Rcpp::export]]
SEXP na_omit( SEXP x ){
RCPP_RETURN_VECTOR( na_omit_template, x ) ;
}
na_omit
has been included in Rcpp
(svn revision >= 4309), with a few modifications, i.e. it can handle named vectors and arbitrary sugar expressions.
I've continued to investigate a solution about the templating problem (i.e. see @Sameer reply).
So I've written another function and now the templating mechanism works.
In an external .cpp
file:
#include <Rcpp.h>
template <int RTYPE, class T>
Vector<RTYPE> na_omit_template(const Vector<RTYPE>& x) {
typedef typename Vector<RTYPE>::iterator rvector_it;
if (x.size() == 0) {
return x;
}
std::vector<T> out;
rvector_it it = x.begin();
for (; it != x.end(); ++it) {
if (!Vector<RTYPE>::is_na(*it)) {
out.push_back(*it);
}
}
return wrap(out);
}
// [[Rcpp::export(na_omit_cpp)]]
SEXP na_omit(SEXP x) {
switch(TYPEOF(x)) {
case INTSXP:
return na_omit_template<INTSXP, int>(x);
case REALSXP:
return na_omit_template<REALSXP, double>(x);
case LGLSXP:
return na_omit_template<LGLSXP, bool>(x);
case CPLXSXP:
return na_omit_template<CPLXSXP, Rcomplex>(x);
case RAWSXP:
return na_omit_template<RAWSXP, Rbyte>(x);
default:
stop("unsupported data type");
}
}
This function removes the NA
values, that was my initial purpose.
Unfortunately, at the moment, it does not work for all types of vectors, as the R
examples below show.
library(Rcpp)
sourceCpp('file.cpp')
na_omit_cpp(as.integer(c(1, NA, NA, 1, 2, NA))) # OK
# [1] 1 1 2
na_omit_cpp(as.numeric(c(1, NA, NA, 1, 2, NA)))
# [1] 1 1 2
na_omit_cpp(c(NA, 1L, NA, 3L, NA)) # OK
# [1] 1 3
na_omit_cpp(c(NA, 2L, 1, NA)) # OK
# [1] 2 1
na_omit_cpp(c(1.0, 1.1, 2.2, NA, 3, NA, 4)) # OK
# [1] 1.0 1.1 2.2 3.0 4.0
na_omit_cpp(c(1L, NaN, NaN, 0, NA)) # OK
# [1] 1 NaN NaN 0
na_omit_cpp(c(NA, NaN, 1.0, 0.0, 2.2, NA, 3.3, NA, 4.4)) # OK
# [1] NaN 1.0 0.0 2.2 3.3 4.4
na_omit_cpp(as.logical(c(1, 0, 1, NA))) # OK
# [1] TRUE FALSE TRUE
na_omit_cpp(as.logical(c(TRUE, FALSE, NA, TRUE, NA))) # OK
# [1] TRUE FALSE TRUE
# empty vectors ?
na_omit_cpp(c(NA)) # OK
# logical(0)
na_omit_cpp(numeric(0)) # OK
# numeric(0)
na_omit_cpp(logical(0)) # OK
# logical(0)
na_omit_cpp(raw(0)) # OK
# raw(0)
na_omit_cpp(as.raw(c(40,16,NA,0,2))) # NO! (R converts it to 00)
# [1] 28 10 00 00 02
# Warning message ...
na_omit_cpp(as.complex(c(-1, 2, 1, NA, 0, NA, -1))) # NO!
# [1] -1+0i 2+0i 1+0i NA 0+0i NA -1+0i
So, this function works in almost all cases except for raw
vectors and complex
vectors.
The current open issues are:
character
vectors?I thought clearly to case STRSXP: return na_omit_template<STRSXP, ?>(x);
but this statement doesn't work substituting std::string
, Rcpp:String
to ?
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With