I'm developing code that integrates an ODE using scipy's complex_ode, where the integrand includes a Fourier transform and exponential operator acting on a large array of complex values.
To optimize performance, I've profiled this and found the main bottleneck is (after optimizing FFTs using PyFFTW etc) in the line:
val = np.exp(float_value * arr)
I'm currently using numpy which I understand calls C code - and thus should be quick. But is there any way to further improve performance please?
I've looked into using Numba but since my main loop includes FFTs too, I don't think it can be compiled (nopython=True flag leads to errors) and thus, I suspect it offers no gain.
Here is a test example for the code I'd like to optimize:
arr = np.random.rand(2**14) + 1j *np.random.rand(2**14)
float_value = 0.5
%timeit np.exp(float_value * arr)
Any suggestions welcomed thanks.
exp() in Python. numpy. exp(array, out = None, where = True, casting = 'same_kind', order = 'K', dtype = None) : This mathematical function helps user to calculate exponential of all the elements in the input array.
The cmath. exp() method accepts a complex number and returns the exponential value. If the number is x, it returns e**x where e is the base of natural logarithms.
The exp() function in Python allows users to calculate the exponential value with the base set to e. Note: e is a Mathematical constant, with a value approximately equal to 2.71828. The math library must be imported for this function to be executed.
Python math library | exp() method One such function is exp(). This method is used to calculate the power of e i.e. e^y or we can say exponential of y.
We could leverage numexpr
module, which works really efficiently on large data involving transcendental operations -
In [91]: arr = np.random.rand(2**14) + 1j *np.random.rand(2**14)
...: float_value = 0.5
...:
In [92]: %timeit np.exp(float_value * arr)
1000 loops, best of 3: 739 µs per loop
In [94]: import numexpr as ne
In [95]: %timeit ne.evaluate('exp(float_value*arr)')
1000 loops, best of 3: 241 µs per loop
This seems to be coherent with the expected performance
as stated in the docs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With