Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

No speed gains from Cython

I am trying to define a function that contains an inner loop for simulating an integral.

The problem is speed. Evaluating the function once can take up to 30 seconds on my machine. Since my ultimate goal is to minimize this function, some extra speed would be nice.

As such, I have tried to get Cython to work, but I must be making a severe mistake (likely many of them!). Following the Cython documentation, I have tried to type my variables. After doing so, the code is just as slow as pure Python. This seems strange.

Here is my code:

import numpy as np  cimport cython cimport numpy as np import minuit  data = np.genfromtxt('q6data.csv', usecols = np.arange(1, 24, 1), delimiter = ',')    cdef int ns    = 1000                 # Number of simulation draws cdef int K     = 5                    # Number of observed characteristics, including            constant cdef int J     = len(data[:,1])       # Number of products, including outside cdef double tol   = 0.0001            # Inner GMM loop tolerance nu = np.random.normal(0, 1, (6, ns))  # ns random deviates  @cython.boundscheck(False) @cython.wraparound(False)   def S(np.ndarray[double, ndim=1] delta, double s1, double s2, double s3, double s4,  double s5, double a):     """Computes the simulated integrals, one for each good.     Parameters: delta is an array of length J containing mean product specific utility levels     Returns: Numpy array with length J."""     cdef np.ndarray[double, ndim=2] mu_ij = np.dot((data[:,2:7]*np.array([s1, s2, s3, s4, s5])), nu[1:K+1,:])     cdef np.ndarray[double, ndim=2] mu_y  = a * np.log(np.exp(data[:,21].reshape(J,1) +  data[:,22].reshape(J,1)*nu[0,:].reshape(1, ns)) - data[:,7].reshape(J,1))     cdef np.ndarray[double, ndim=2] V = delta.reshape(J,1) + mu_ij + mu_y     cdef np.ndarray[double, ndim=2] exp_vi = np.exp(V)     cdef np.ndarray[double, ndim=2] P_i = (1.0 / np.sum(exp_vi[np.where(data[:,1] == 71)], 0)) * exp_vi[np.where(data[:,1] == 71)]      cdef int yrs = 19     cdef int yr     for yr in xrange(yrs):         P_yr = (1.0 / np.sum(exp_vi[np.where(data[:,1]== (yr + 72))], 0)) *   exp_vi[np.where(data[:,1] == (yr + 72))]         P_i  = np.concatenate((P_i, P_yr))      cdef np.ndarray[double, ndim=1] S = np.zeros(dtype = "d", shape = J)     cdef int j     for j in xrange(ns):         S += P_i[:,j]     return (1.0 / ns) * S  def d_infty(np.ndarray[double, ndim=1] x, np.ndarray[double, ndim=1] y):     """Sup norm."""     return np.max(np.abs(x - y))   def T(np.ndarray[double, ndim=1] delta_exp, double s1, double s2, double s3, double s4,  double s5, double a):     """The contraction operator.  This function takes the parameters and the exponential     of the starting value of delta and returns the fixed point."""      cdef int iter = 0     cdef int maxiter = 200     cdef int i     for i in xrange(maxiter):          delta1_exp = delta_exp * (data[:, 8] / S(np.log(delta_exp), s1, s2, s3, s4, s5, a))                             print i         if d_infty(delta_exp, delta1_exp) < tol:                                                    break         delta_exp = delta1_exp     return np.log(delta1_exp)   def Q(double s1, double s2, double s3, double s4, double s5, double a):     """GMM objective function."""       cdef np.ndarray[double, ndim=1] delta0_exp = np.exp(data[:,10])                                                          cdef np.ndarray[double, ndim=1] delta1 = T(delta0_exp, s1, s2, s3, s4, s5, a)     delta1[np.where(data[:,10]==0)] = np.zeros(len(np.where(data[:,10]==0)))                 cdef np.ndarray[double, ndim=1] xi =  delta1 - (np.dot(data[:,2:7],   np.linalg.lstsq(data[:,2:7], delta1)[0]))        cdef np.ndarray[double, ndim=2] g_J = xi.reshape(J,1) * data[:,11:21]     cdef np.ndarray[double, ndim=1] G_J = (1.0 / J) * np.sum(g_J, 0)      return np.sqrt(np.dot(G_J, G_J)) 

I have profiled the code, and it seems to be the function S, the integral simulator, that is killing performance. Anyway, I would have expected at least some speed gains from typing my variables. Since it produced no gains, I am led to believe that I am making some fundamental mistakes.

Does anybody see a glaring error in the Cython code that could lead to this result?

Oh, and since I am very new to programming, there is surely a lot of bad style and things slowing the code down. If you have the time, feel free to set me straight on these points as well.

like image 925
Randall J Avatar asked Nov 24 '10 07:11

Randall J


People also ask

Does Cython improve performance?

The CPython + Cython implementation is the fastest; it is 44 times faster than the CPython implementation. This is an impressive speed improvement, especially considering that the Cython code is very close to the original Python code in its design.

Does Cython make code faster?

Cython will get you good speedups on almost any raw Python code, without too much extra effort at all. The key thing to note is that the more loops you're going through, and the more data you're crunching, the more Cython can help.

Why is Cython slow?

When Cython is slower, it's probably due to type conversions, and possibly exacerbated by a lack of type annotations. Also, if you use C datastructures in Cython, that'll tend to be faster than using Python datastructures in Cython.

How do I speed up Cython in Python?

You can still write regular code in Python, but to speed things up at run time Cython allows you to replace some pieces of the Python code with C. So, you end up mixing both languages together in a single file.


2 Answers

Cython can produce an html file to help with this:

cython -a MODULE.py 

This shows each line of source code colored white through various shades of yellow. The darker the yellow color, the more dynamic Python behaviour is still being performed on that line. For each line that contains some yellow, you need to add more static typing declarations.

When I'm doing this, I like to split parts of my source code that I'm having trouble with onto many separate lines, one for each expression or operator, to get the most granular view.

Without this, it's easy to overlook some static type declarations of variables, function calls or operators. (e.g. the indexing operator x[y] is still a fully-dynamic Python operation unless you declare otherwise)

like image 177
Jonathan Hartley Avatar answered Oct 10 '22 21:10

Jonathan Hartley


Cython doesn't offer automatic performance gains, you have to know its internals and check the generated C code.

In particular if you want to improve loops performances, you have to avoid calling Python functions in them, which you happen to do a lot in this case (all the np. calls are Python calls, slicing, and probably other things).

See this page for general guidelines about performance optimization with Cython (the -a switch really is handy when optimizing) and this one for specificities when optimizing numpy code.

like image 22
Luper Rouch Avatar answered Oct 10 '22 20:10

Luper Rouch