I have a large vector containing a bunch of double elements. Given an array of percentile vector, such as percentile_vec = c(0.90, 0.91, 0.92, 0.93, 0.94, 0.95)
. I am currently using Rcpp sort
function to sort the large vector and then find the corresponding percentile value. Here is the main codes:
// [[Rcpp::export]]
NumericVector sort_rcpp(Rcpp::NumericVector& x)
{
std::vector<double> tmp = Rcpp::as<std::vector<double>> (x); // or NumericVector tmp = clone(x);
std::sort(tmp.begin(), tmp.end());
return wrap(tmp);
}
// [[Rcpp::export]]
NumericVector percentile_rcpp(Rcpp::NumericVector& x, Rcpp::NumericVector& percentile)
{
NumericVector tmp_sort = sort_rcpp(x);
int size_per = percentile.size();
NumericVector percentile_vec = no_init(size_per);
for (int ii = 0; ii < size_per; ii++)
{
double size_per = tmp_sort.size() * percentile[ii];
double size_per_round;
if (size_per < 1.0)
{
size_per_round = 1.0;
}
else
{
size_per_round = std::round(size_per);
}
percentile_vec[ii] = tmp_sort[size_per_round-1]; // For extreme case such as size_per_round == tmp_sort.size() to avoid overflow
}
return percentile_vec;
}
I also try to call R function quantile(x, c(.90, .91, .92, .93, .94, .95))
in Rcpp by using:
sub_percentile <- function (x)
{
return (quantile(x, c(.90, .91, .92, .93, .94, .95)));
}
source('C:/Users/~Call_R_function.R')
The test rests for x=runif(1E6)
are listed below:
microbenchmark(sub_percentile(x)->aa, percentile_rcpp(x, c(.90, .91, .92, .93, .94, .95))->bb)
#Unit: milliseconds
expr min lq mean median uq max neval
sub_percentile(x) 99.00029 99.24160 99.35339 99.32162 99.41869 100.57160 100
percentile_rcpp(~) 87.13393 87.30904 87.44847 87.40826 87.51547 88.41893 100
I expect a fast speed percentile calculation, yet I assume std::sort(tmp.begin(), tmp.end())
slows down the speed. Is there any better way to get a fast result using C++, RCpp/RcppAramdillo? Thanks.
Branching in a loop could be surely optimized. Use std::min/max calls with ints.
I would solve percent calculation of array indices this way:
uint PerCentIndex( double pc, uint size )
{
return 0.5 + ( double ) ( size - 1 ) * pc;
}
Only this line in the middle of the loop above:
percentile_vec[ii]
= tmp_sort[ PerCentIndex( percentile[ii], tmp_sort.size() ) ];
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With