In a C++ CPU-bound simulation that I'm writing, I've traced a bottleneck via valgrind in my program to cmath::exp
. It currently eats over 40% of my simulation time. I can bound the input to a relatively small domain, but I'd like to control the accuracy. I was considering moving to a LUT (lookup table) to replace exp
but I'm not quite sure how to do this the "right way"(tm). Concerns I have:
What is the best way to implement/(include from a library) a LUT for exp
?
The optimal lookup table size is determined by the trade-off you make between performance, accuracy, and implementation complexity. You will have to profile, we cannot tell you the answer (we don't know the answer).
Use lrint
from <math.h>
to convert double
to long int
. I'm not sure if it's in <cmath>
.
I am not sure what slope has to do with converting floating point numbers to integers. Could you elaborate on what you are worried about?
Yes, you are reinventing the wheel. What you are doing has been done over and over again, by anybody who has ever implemented a math library. There is a lot of literature on the subject.
A straight look-up table is far from optimal. You will want to use some kind of polynomial approximation, perhaps a piecewise one with coefficients selected from a look-up table. For a function as smooth and predictable as exp
, a polynomial will give you much higher accuracy for the same amount of computational effort. The required polynomials will depend on the tradeoff between complexity and accuracy, as well as whether you want to minimize expected error, minimize maximum error, or use some other loss function.
Restricting the domain of exp
does not actually help that much, since it's so easy to expand to the entire domain.
// only works for x in [0, 1]
double exp2_limited(double x);
// works for all x, but doesn't handle overflow properly
double exp2(double x)
{
return scalbn(exp2_limited(fmod(x, 1.0)), (long) floor(x));
}
Summary:
You have to know the required accuracy before you can design such a function.
You also have to know the loss function (i.e., choose the loss function).
You have to profile before you know how fast it is.
Use polynomials.
I've had this problem and I took a few stack samples to diagnose it.
What that does is tell where the calls are coming from and what the argument value is.
I found that when exp
was called from particular places, the argument value was highly repeatable.
That suggested a memoization approach, which made a huge difference.
It needed a simple "wrapper" function:
double exp_cached(double arg, double* old_arg, double* old_result){
if (arg== *old_arg) return *old_result;
*old_arg = arg;
*old_result = exp(arg);
return *old_result;
}
and wherever exp(foo)
used to be called, do:
static double old_arg = -999999999, old_result;
...
... exp_cached(foo, &old_arg, &old_result)...
That way, exp
doesn't get called if its argument at that place where it is called from has the same argument value as before.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With