I am trying to optimize a Loop that I have in a piece of my code. I thought that writing it in a more numpy way would make it faster, but is slower now! the equations takes as input a numpy.array vec of length n:
from numpy import *
def f(vec):
n=len(vec)
aux=0
for i in range(n):
aux = aux + (1- aux)*vec[i]
return aux
def f2(vec):
n=len(vec)
G=tril(array([-vec]*n),-1)+1 #numpy way!
aux=dot(G.prod(1),vec)
return aux
if __name__ == '__main__':
import timeit
print(timeit.timeit("f(ones(225)+4)", setup="from __main__ import f\nfrom numpy import ones",number=1000))
print(timeit.timeit("f2(ones(225)+4)", setup="from __main__ import f2\nfrom numpy import ones,tril,dot",number=1000))
0.429496049881 [s]
5.66514706612 [s]
Finally I decided to inserte the whole function in my loop, getting a 3x performance boost. I am really looking for a 100x performance boost, but dont know what else to do. This is my final function:
def CALC_PROB_LOC2(int nSectors, int nZones,double[:] beta, double[:] thetaLoc,np.ndarray[double, ndim=2] h, np.ndarray[double, ndim=2] p, np.ndarray[np.float64_t, ndim=3] U_nij, np.ndarray[double, ndim=2] A_ni):
cdef np.ndarray[double, ndim=3] Pr_nij =np.zeros((nSectors,nZones,nZones),dtype="d")
cdef np.ndarray[double, ndim=2] U_ni =np.zeros((nSectors,nZones),dtype="d")
#cdef np.ndarray[np.float64_t, ndim=1] A_ni_pos
cdef Py_ssize_t n,i,opt
cdef int aux_bool,options
cdef np.ndarray[np.float64_t, ndim=1] prob,attractor,optionCosts
cdef np.ndarray[np.float64_t, ndim=1] eq23,utilities
cdef double disu
cdef double eq22
cdef double aux17
for n in range(nSectors):
aux_bool=1
if n in [0,2,9,10,11,12,13,14,18,19,20]:
for i in xrange(nZones):
U_ni[n,i]=p[n,i]+h[n,i]
Pr_nij[n,i,i]=1
aux_bool=0
if aux_bool==1:
if beta[n]<=0:
for i in xrange(nZones):
U_ni[n,i]=U_nij[n,i,i]
else:
A_ni_pos=A_ni[n,:]>0
options=len(A_ni[n,:][A_ni_pos])
attractor=A_ni[n,:][A_ni_pos]
if options>0:
for i in xrange(nZones):
optionCosts=U_nij[n,i,A_ni_pos]
disu=0
eq22=0
aux17=0
prob=np.ones(options)/options #default value
if beta[n]==0:
Pr_nij[n,i,A_ni_pos],U_ni[n,i]= prob,0
if options==1:
Pr_nij[n,i,A_ni_pos],U_ni[n,i]= prob,optionCosts
else:
if thetaLoc[n]<=0:
cmin=1
else:
cmin=(optionCosts**thetaLoc[n]).min()
if cmin==0:
cmin=100
utilities=optionCosts/cmin
eq23=-beta[n]*utilities
eq23=np.exp(eq23)
aux17=np.dot(attractor,eq23)
if aux17==0:
Pr_nij[n,i,A_ni_pos],U_ni[n,i]= 0*prob,0
else:
for opt in range(options):
eq22=eq22+(1-eq22)*eq23[opt]
prob=attractor*eq23/aux17
disu=cmin*(-np.log(eq22)/beta[n])
Pr_nij[n,i,A_ni_pos],U_ni[n,i]= prob,disu
return Pr_nij,U_ni
There is no built-in tail recursion optimization in Python. However, we can "rebuild" the function through the Abstract Syntax Tree( AST), eliminating the recursion there and replacing it with a loop.
Sometimes the best way to improve the efficiency of a recursive algorithm is to not use recursion at all. In the case of generating Fibonacci numbers, an iterative technique called the bottom-up approach can save us both time and space.
In short, recursion is not bad in Python and is often needed for programs that will be doing depth first traversals like web crawlers or directory searches. The Towers of Hanoi smallest steps problem can also be solved using a recursive algorithm with the following Python code.
Due to this, the recursion limit of python is usually set to a small value (approx, 10^4). This means that when you provide a large input to the recursive function, you will get an error. This is done to avoid a stack overflow. The Python interpreter limits the recursion limit so that infinite recursions are avoided.
That's what happens when a linear algorithms gets replaced by a quadratic one: No matter how fast it's executed, the better algorithm always wins (for a problem big enough).
It's pretty clear that f
runs in linear time, and f2
runs in quadractic time because that's the time complexity of a matrix-vector dot product.
A log-log plot clearly shows the difference in running time (linear refers to f
, quadractic to f2
):
The right-most part of the green line (ie, when it doesn't behave as a straight line) can be explained because numpy functions usually have a high overhead that's negligible for arrays that are not tiny but dominates the running time when they are small.
The "standard" way to speed up code in Python that's already using a fast algorithm is to reach for compiled code and write an extension. Cython lets you do that by annotating the Python source code with a few type annotations, and it understands numpy arrays.
By telling Cython that vec
is an array of doubles, aux
is a double and i
an integer, it's able to generate a C extension which is 400x faster for me.
def f(double[:] vec):
n = len(vec)
cdef double aux = 0
cdef int i
for i in range(n):
aux = aux + (1- aux)*vec[i]
return aux
If you happen to be using IPython, you can just run %load_ext cythonmagic
and then copy that function to a cell prefixed by the line %%cython
to try it out. Other methods to build and compile it are explained in the Cython documentation. By the way, IPython also lets you timeit code by writing %timeit
before the statement to time, it's really handy.
A completely different option is to use PyPy, a Python 2.7 implementation that comes with a JIT and has some basic numpy support. It can run this small snippet by replacing import numpypy
for import numpy
, but it's possible that it won't be able to run your whole program. It is a tad slower than Cython but it doesn't requiere a compiler nor annotating the code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With