I am trying to optimize my python code. One of the bottlenecks comes when I tried to apply a function to a numpy array according to each element value. For instance, I have an array with thousand elements and I apply a function for values greater than a tolerance and another function (Taylor series) for the rest. I do masking but still slow, at least I'm calling the following functions for 64 millions times.
EPSILONZETA = 1.0e-6
ZETA1_12 = 1.0/12.0
ZETA1_720 = 1.0/720.0
def masked_condition_zero(array, tolerance):
""" Return the indices where values are lesser (and greater) than tolerance
"""
# search indices where array values < tolerance
indzeros_ = np.where(np.abs(array) < tolerance)[0]
# create mask
mask_ = np.ones(np.shape(array), dtype=bool)
mask_[[indzeros_]] = False
return (~mask_, mask_)
def bernoulli_function1(zeta):
""" Returns the Bernoulli function of zeta, vector version
"""
# get the indices according to condition
zeros_, others_ = masked_condition_zero(zeta, EPSILONZETA)
# create an array filled with zeros
fb_ = np.zeros(np.shape(zeta))
# Apply the original function to the values greater than EPSILONZETA
fb_[others_] = zeta[others_]/(np.exp(zeta[others_])-1.0)
# computes series for zeta < eps
zeta0_ = zeta[zeros_]
zeta2_ = zeta0_ * zeta0_
zeta4_ = zeta2_ * zeta2_
fb_[zeros_] = 1.0 - 0.5*zeta0_ + ZETA1_12 * zeta2_ - ZETA1_720 * zeta4_
return fb_
Now suppose you have an array zeta with negative and positive floats which changes in each loop that extends to 2^26 iterations and you want to compute fbernoulli_function1(zeta) each time.
There is a better solution ?
The basic structure of the problem is:
def foo(zeta):
result = np.empty_like(zeta)
I = condition(zeta)
nI = ~I
result[I] = func1(zeta[I])
result[nI] = func2(zeta[nI])
It looks like your polynomial expression can be evaluated at all zeta
, but it is the 'exception', the fall back calculation when zeta
is too close to 0.
If both functions can be evaluated for a zeta
, you could use where:
np.where(condition(zeta), func1(zeta), func2(zeta))
This is streamlined version of:
def foo(zeta):
result = np.empty_like(zeta)
I = condition(zeta)
nI = ~I
v1 = func1(zeta)
v2 = func2(zeta)
result[I] = v1[I]
result[nI] = v2[nI]
Another option is to apply one function to all values, and the other only to the 'exceptions'.
def foo(zeta):
result = func2(zeta)
I = condition(zeta)
result[I] = func1[zeta[I]]
and of course the reverse - result = func1(zeta); result[nI]=func2[zeta]
.
In my brief time tests, func1
, func2
take about the same time.
masked_condition_zero
also takes that time, but the simpler np.abs(array) < tolerance
(and it's ~J
) cuts that in half.
Lets compare the allocation strategies
def foo(zeta, J, nJ):
result = np.empty_like(zeta)
result[J] = fun1(zeta[J])
result[nJ] = fun2(zeta[nJ])
return result
For a sample where zeta[J]
is 10% of the full zeta
, some sample times are:
In [127]: timeit foo(zeta, J, nJ)
10000 loops, best of 3: 55.7 µs per loop
In [128]: timeit result=fun2(zeta); result[J]=fun1(zeta[J])
10000 loops, best of 3: 49.2 µs per loop
In [129]: timeit np.where(J, fun1(zeta),fun2(zeta))
10000 loops, best of 3: 73.4 µs per loop
In [130]: timeit result=fun1(zeta); result[nJ]=fun2(zeta[nJ])
10000 loops, best of 3: 60.7 µs per loop
The second case is fastest because running fun1
on fewer values compensates for the added cost of indexing zeta[J]
. There's a trade off between the cost of indexing and the cost of function evaluation. Boolean indexing like this is more expensive than slicing. With other mixes of values, the timings could go the other direction.
It looks like a problem where you can whittle away at the times, but I don't see any break though stategy that would cut times by an order of magnitude.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With