I'm trying to convert my program to Numba, but I have a problem nesting one function inside another. My methods are based on NumPy vectorize
, but I can't do the same using numba. Do you know any analogous examples that I could follow?
This is my program:
import numpy as np
import scipy
import functools
from multiprocessing import Pool
import lib as mlib
from tqdm import tqdm
class vectorize(np.vectorize):
def __get__(self, obj, objtype):
return functools.partial(self.__call__, obj)
class stability:
def __init__(self):
self.r1 = 20
self.r2 = 50
self.rz= 20
self.zeta = self.r2/self.r1
self.beta = self.rz/self.r1
self.Ms = 0.956e6
self.delta = 1
@vectorize
def f(self,ro,rs):
# print("delta=",self.delta)
return rs/ro*np.exp( (-1/self.delta)*(ro-rs))
@vectorize
def mz(self,ro,rs):
return ( 1-self.f(ro,rs)**2 ) / ( 1+self.f(ro,rs)**2 )
@vectorize
def mro(self,ro,rs):
return ( 2*self.f(ro,rs) ) / ( 1+self.f(ro,rs)**2 )
@vectorize
def E1m(self,a, b, N,rs,d):
r = np.linspace(a+(b-a)/(2*N), b-(b-a)/(2*N), N)
fx = r* ((1/self.delta+1/r)**2 * self.mro(r,rs)**2 + (1/r**2 + 1)*self.mro(r,rs)**2+d*(-(1/self.delta + 1/r) * self.mro(r,rs) + 1/r * self.mro(r,rs)*self.mz(r,rs) ))
area = np.sum(fx)*(b-a)/N
return area
if __name__ == "__main__":
rs = np.arange(0,100,1)
model = stability()
print(model.E1m(0,20,300,rs,2))
Using vectorize(), you write your function as operating over input scalars, rather than arrays. Numba will generate the surrounding loop (or kernel) allowing efficient iteration over the actual inputs. The vectorize() decorator needs you to pass a list of signatures you want to support.
Again, some have observed vectorize to be faster than normal for loops, but even the NumPy documentation states: “The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.”
NumPy support in Numba comes in many forms: Numba understands calls to NumPy ufuncs and is able to generate equivalent native code for many of them. NumPy arrays are directly supported in Numba. Access to Numpy arrays is very efficient, as indexing is lowered to direct memory accesses when possible.
Large dataFor larger input data, Numba version of function is must faster than Numpy version, even taking into account of the compiling time. In fact, the ratio of the Numpy and Numba run time will depends on both datasize, and the number of loops, or more general the nature of the function (to be compiled).
Most of the built-in NumPy functions are already vectorized and don't need the np.vectorize
decorator at all. In general the numpy.vectorize
decorator will produce very slow results (compared to NumPy)! As the documentation mentions in the Notes section:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
You can improve your codes efficiency immensely by removing the decorator from f
, mz
, and mro
. It will give the same results but run much faster (your code 10.4 seconds, changed code 0.014 seconds).
The E1m
function could also be improved (in terms of performance) by using broadcasting instead of vectorize
.
However since your question was about how to use numba.vectorize
on these functions I have some bad news: It's not possible to use numba.vectorize
on instance methods - because numba needs type information and these are not available for custom Python classes.
In general it's also best with numba to start with a pure-loop code on NumPy arrays (no vectorize) and then use the numba njit
decorator (or jit(nopython=True)
. That won't work on methods too but it's much easier to pass in scalar arguments and only iterate over the required arrays.
But if you really want to use the vectorize
approach, here's how you would do it for f
:
self
, so you need a static method or a standalone function. Because you don't have access to self
you either need to pass in delta
or make it global. I decided to make it an argument:def f(ro, rs, delta):
return rs / ro * np.exp((-1 / delta) * (ro - rs))
return_type(argument_1_type, argument_2_type, ....)
):@nb.vectorize('f8(i8, f8, f8)')
def f(ro, rs, delta):
return rs / ro * np.exp((-1 / delta) * (ro - rs))
And that's basically it.
For mz
and mro
you can do the same (remember that you also need the delta
there):
@nb.vectorize('f8(i8, f8, f8)')
def mz(ro, rs, delta):
return (1 - f(ro, rs, delta)**2) / (1 + f(ro, rs, delta)**2)
@nb.vectorize('f8(i8, f8, f8)')
def mro(ro, rs, delta):
return (2 * f(ro, rs, delta) ) / (1 + f(ro, rs, delta)**2)
Converting the E1m
function seems to be a bit trickier (I haven't attempted it) and I leave it as an exercise for the reader.
In case you're interested how I would solve it without vectorize
:
import numpy as np
import numba as nb
@nb.njit
def f(ro, rs, delta):
return rs / ro * np.exp((-1 / delta) * (ro - rs))
@nb.njit
def mz(ro, rs, delta):
f_2 = f(ro, rs, delta) ** 2
return (1 - f_2) / (1 + f_2)
@nb.njit
def mro(ro, rs, delta):
f_ = f(ro, rs, delta)
return (2 * f_ ) / (1 + f_**2)
@nb.njit(parallel=True)
def E1m(a, b, N, rs, d):
delta = 1
r = np.linspace(a + (b - a) / (2 * N), b - (b - a) / (2 * N), N)
result = np.empty(rs.size)
for idx in nb.prange(rs.size):
rs_item = rs[idx]
sum_ = 0.
for r_item in r:
mro_ = mro(r_item, rs_item, delta)
sum_ += r_item * ((1 / delta + 1 / r_item)**2 * mro_**2
+ (1 / r_item**2 + 1) * mro_**2
+ d * (-(1 / delta + 1 / r_item) * mro_
+ 1 / r_item * mro_ * mz(r_item, rs_item, delta)))
result[idx] = sum_ * (b - a) / N
return result
There's probably still a bit that could be optimized by loop-lifting or smarter approaches with the calculations but on my computer it's already pretty fast: ~100 microseconds compared to the 14 milliseconds from above, so again 100 times faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With