Fastest way to compute the cdf of the Normal distribution over vectors - R::pnorm vs erfc vs?

Tags:

I hope my reworded question now fits the criteria of Stackoverflow. Please consider the example below. I am writing a Log-Likelihood function in which computing the cdf over vectors is the most time consuming part. Example 1 uses the R::pnorm, Example 2 approximates the normal cdf with erfc. As you can see the results are sufficiently similar, the ercf version is a bit faster.

In practice (within an MLE) however it turns out that the ercf is not as precise, which lets the algorithm run into inf areas unless one sets the constraints accurately. My questions:

1) Am I missing something? Is it necessary to implement some error handling (for the erfc)?

2) Do you have any other suggestions to speed up the code, or alternatives? Does it pay off to look into parallelizing the for-loop?

require(Rcpp)
require(RcppArmadillo)
require(microbenchmark)

#Example 1 : standard R::pnorm
src1 <- '
NumericVector ppnorm(const arma::vec& x,const arma::vec& mu,const     arma::vec& sigma, int lt, int lg) {
int n = x.size();
arma::vec res(n);
for (int i=0; i<n; i++) {
   res(i) = R::pnorm(x(i),mu(i),sigma(i),lt,lg);
}
return wrap(res);
}
'

#Example 2: approximation with ercf
src2 <- '
NumericVector ppnorm(const arma::vec& x,const arma::vec& mu,const    arma::vec& sigma, int lt, int lg) {
int n = x.size();
arma::vec res(n);
for (int i=0; i<n; i++) {
res(i) = 0.5 * erfc(-(x(i) - mu(i))/sigma(i) * M_SQRT1_2);
}
if (lt==0 & lg==0) {
   return wrap(1 - res);
}
if (lt==1 & lg==0) {
   return wrap(res);
}
if (lt==0 & lg==1) {
   return wrap(log(1 - res));
}
if (lt==1 & lg==1) {
   return wrap(log(res));
}
}
'

#some random numbers
xex  = rnorm(100,5,4)
muex = rnorm(100,3,1)
siex = rnorm(100,0.8,0.3)

#compile c++ functions 
func1 = cppFunction(depends = "RcppArmadillo",code=src1) #R::pnorm
func2 = cppFunction(depends = "RcppArmadillo",code=src2) #ercf

#run with exemplaric data
res1 = func1(xex,muex,siex,1,0)
res2 = func2(xex,muex,siex,1,0)

# sum of squared errors
sum((res1 - res2)^2,na.rm=T)
# 6.474419e-32 ... very small

#benchmarking
 microbenchmark(func1(xex,muex,siex,1,0),func2(xex,muex,siex,1,0),times=10000)
#Unit: microseconds
#expr    min      lq     mean median     uq     max neval
#func1(xex, muex, siex, 1, 0) 11.225 11.9725 13.72518 12.460 13.617 103.654 10000
#func2(xex, muex, siex, 1, 0)  8.360  9.1410 10.62114  9.669 10.769 205.784 10000
#my machine: Ubuntu 14.04 LTS, i7 2640M 2.8 Ghz x 4, 8GB memory, RRO 3.2.0 based on version R 3.2.0

541

asked May 19 '15 15:05

chameau13

1 Answers

1) Well, you really should use R's pnorm() as your 0-th example. You don't, you use the Rcpp interface to it. R's pnorm() is already nicely vectorized R-internally (i.e. on C level) so may well be comparative or even faster than Rcpp. Also it does have the advantage to cover cases of NA, NaN, Inf, etc..

2) If you are talking about MLE, and you are concerned about speed and accuracy, you almost surely should rather work with the logarithms, and maybe not withpnorm() but rather dnorm() ?

answered Sep 21 '22 11:09

Martin Mächler

Related questions
                            
                                C++ code for CPU load and CPU temperature
                            
                                Perfect pass-through
                            
                                Static code analysis for detecting passing a wchar_t* to BSTR
                            
                                Library ABI compatibility between versions of Visual Studio
                            
                                Android: build native GUI app with NDK? [closed]
                            
                                Best practices for storing settings
                            
                                How to access network file?
                            
                                why is inlined function slower than function pointer?
                            
                                Get number of elements greater than a number
                            
                                conditional vs operator?
                            
                                Partial Binding of Function Arguments
                            
                                tail recursion performance on template meta-programming
                            
                                Are there any equivalents to the futex in Linux/Unix?
                            
                                How is it parsed: constructing unnamed temporary with braced init list
                            
                                Segmentation fault while running Google test
                            
                                Is there a calculation error in my program?
                            
                                Return value or rvalue reference?
                            
                                Data-oriented tree traversal without recursion
                            
                                Are short circuit evaluation rules expected at compile time?
                            
                                C++ noexcept for a function not throwing exceptions, but can cause a memory failure

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fastest way to compute the cdf of the Normal distribution over vectors - R::pnorm vs erfc vs?

Tags:

c++

r

rcpp

armadillo

chameau13

People also ask

1 Answers

Martin Mächler

Recent Activity

Donate For Us