numpy routines don't appear to be that fast

Tags:

I'm using python to do some Bayesian statistics. I've coded it up in python and in Fortran 95. The Fortran code is waaay faster... like a factor of 100. I expected the Fortran to be faster, but I was really hoping that by using numpy I could get the python code to come close, maybe within a factor of 2. I've profiled the python code and it looks like the majority of the time is spent doing the following things:

scipy.stats.rvs: taking a random draw from a distribution. I do this ~19000 times and it takes a total time of 3.552 sec

numpy.slogdet: computing the log of the determinant of a matrix. I do this ~10,000 and it takes a total of 2.48 s

numpy.solve: solve a linear system: I call this routine ~10,000 times for a total time of 2.557 s

In total my code runs in ~ 11 sec whereas my fortran code takes .092 sec. Is this a joke? I'm really not trying to be unrealistic in my expectations of python, and I certainly don't expect to get my python code to be as fast as Fortran.. but to be slower by a factor of > 100.. Python's gotta be able to do better than that. Just in case you are curious, here is the full output of my profiler:( I don't know why it broke the text into several blocks)

     1290611 function calls in 11.296 CPU seconds

Ordered by: internal time, function name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)

18973    0.864    0.000    3.552    0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:484(rvs)
 9976    0.819    0.000    2.480    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:1559(slogdet)
 9976    0.627    0.000    6.659    0.001 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:77(evaluate_posterior)
 9384    0.591    0.000    0.753    0.000 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:39(construct_R_matrix)
77852    0.533    0.000    0.533    0.000 :0(array)
37946    0.520    0.000    1.489    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:32(_wrapit)
77851    0.423    0.000    0.956    0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:216(asarray)
37946    0.360    0.000    0.360    0.000 :0(all)
 9976    0.335    0.000    2.557    0.000 /usr/lib64/python2.6/sitepackages/scipy/linalg/basic.py:23(solve)
107799    0.322    0.000    0.322    0.000 :0(len)

109740    0.301    0.000    0.301    0.000 :0(issubclass)

28357    0.294    0.000    0.294    0.000 :0(prod)
 9976    0.287    0.000    0.957    0.000 /usr/lib64/python2.6/site-packages/scipy/linalg/lapack.py:45(find_best_lapack_type)
    1    0.282    0.282   11.294   11.294 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:199(get_rho_lambda_draws)
 9976    0.269    0.000    1.386    0.000 /usr/lib64/python2.6/site-packages/scipy/linalg/lapack.py:60(get_lapack_funcs)
19952    0.263    0.000    0.476    0.000 /usr/lib64/python2.6/site-packages/scipy/linalg/lapack.py:23(cast_to_lapack_prefix)
19952    0.235    0.000    0.669    0.000 /usr/lib64/python2.6/site-packages/numpy/lib/function_base.py:483(asarray_chkfinite)
66833    0.212    0.000    0.212    0.000 :0(log)
18973    0.207    0.000    1.054    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1427(product)
29931    0.205    0.000    0.205    0.000 :0(reduce)
28949    0.187    0.000    0.856    0.000 :0(map)
 9976    0.175    0.000    0.175    0.000 :0(dot)
47922    0.163    0.000    0.163    0.000 :0(getattr)
 9976    0.157    0.000    0.206    0.000 /usr/lib64/python2.6/site-packages/numpy/lib/twodim_base.py:169(eye)
19952    0.154    0.000    0.271    0.000 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:32(loggbeta)
18973    0.151    0.000    0.793    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1548(all)
19953    0.146    0.000    0.146    0.000 :0(any)
 9976    0.142    0.000    0.316    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:99(_commonType)
 9976    0.133    0.000    0.133    0.000 :0(dgetrf)
18973    0.125    0.000    0.175    0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:462(_fix_loc_scale)
39904    0.117    0.000    0.117    0.000 :0(append)
18973    0.105    0.000    0.292    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1461(alltrue)
19952    0.102    0.000    0.102    0.000 :0(zeros)
19952    0.093    0.000    0.154    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:71(isComplexType)
19952    0.090    0.000    0.090    0.000 :0(split)
 9976    0.089    0.000    2.569    0.000 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:62(get_log_determinant_of_matrix)
19952    0.087    0.000    0.134    0.000 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:35(logggamma)
 9976    0.083    0.000    0.154    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:139(_fastCopyAndTranspose)
 9976    0.076    0.000    0.125    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:157(_assertSquareness)
 9976    0.074    0.000    0.097    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:151(_assertRank2)
 9976    0.072    0.000    0.119    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:127(_to_native_byte_order)
18973    0.072    0.000    0.072    0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:832(_argcheck)
 9976    0.072    0.000    0.228    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:901(diagonal)
 9976    0.070    0.000    0.070    0.000 :0(arange)
 9976    0.061    0.000    0.061    0.000 :0(diagonal)
 9976    0.055    0.000    0.055    0.000 :0(sum)
 9976    0.053    0.000    0.075    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:84(_realType)
11996    0.050    0.000    0.091    0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:1412(_rvs)
 9384    0.047    0.000    0.162    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1898(prod)
 9976    0.045    0.000    0.045    0.000 :0(sort)
11996    0.041    0.000    0.041    0.000 :0(standard_normal)
 9976    0.037    0.000    0.037    0.000 :0(_fastCopyAndTranspose)
 9976    0.037    0.000    0.037    0.000 :0(hasattr)
 9976    0.037    0.000    0.037    0.000 :0(range)
 6977    0.034    0.000    0.055    0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:3731(_rvs)
 9977    0.027    0.000    0.027    0.000 :0(max)
 9976    0.023    0.000    0.023    0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:498(isfortran)
 9977    0.022    0.000    0.022    0.000 :0(min)
 9976    0.022    0.000    0.022    0.000 :0(get)
 6977    0.021    0.000    0.021    0.000 :0(uniform)
    1    0.001    0.001   11.295   11.295 <string>:1(<module>)
    1    0.001    0.001   11.296   11.296 profile:0(get_rho_lambda_draws(correlations,energies,rho_priors,lambda_e_prior,lambda_z_prior,candidate_sig2_rhos,candidate_sig2_lambda_e,candidate_sig2_lambda_z,3000))
    2    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:445(__call__)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:385(__init__)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:175(_array2string)
    2    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:475(_digits)
    2    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:309(_extendLine)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:317(_formatArray)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1477(any)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:243(array2string)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:1390(array_str)
    1    0.000    0.000    0.000    0.000 :0(compress)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:394(fillFormat)
    6    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:2166(geterr)
   12    0.000    0.000    0.000    0.000 :0(geterrobj)
    0    0.000             0.000          profile:0(profiler)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1043(ravel)
    1    0.000    0.000    0.000    0.000 :0(ravel)
    8    0.000    0.000    0.000    0.000 :0(rstrip)
    6    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:2070(seterr)
    6    0.000    0.000    0.000    0.000 :0(seterrobj)
    1    0.000    0.000    0.000    0.000 :0(setprofile)

EDIT:

Here is copy of the relevant routines

def get_rho_lambda_draws(correlations, energies, rho_priors, lam_e_prior, lam_z_prior,  
                         candidate_sig2_rhos, candidate_sig2_lambda_e, 
                         candidate_sig2_lambda_z, ndraws):

    nBasis = len(correlations[0])
    nStruct = len(correlations)

    rho _draws = [ [0.5 for x in xrange(nBasis)] for y in xrange(ndraws)]
    lambda_e_draws = [ 5 for x in xrange(ndraws)]
    lambda_z_draws = [ 5 for x in xrange(ndraws)]
            
    accept_rhos = array([0. for x in xrange(nBasis)])
    accept_lambda_e = 0.
    accept_lambda_z = 0.

    for i in xrange(1,ndraws):
        if i % 100 == 0:
            print i, "REP<---------------------------------------------------------------------------------"
        #do metropolis to get rho
        rho_draws[i] = [x for x in rho_draws[i-1]]
        lambda_e_draws[i] = lambda_e_draws[i-1]
        lambda_z_draws[i] = lambda_z_draws[i-1]

        rho_vec = [x for x in rho_draws[i-1]]
        R_matrix_before =construct_R_matrix(correlations,correlations,rho_vec)
        post_before = evaluate_posterior(R_matrix_before,rho_vec,energies,lambda_e_draws[i-1],lambda_z_draws[i-1],lam_e_prior,lam_z_prior,rho_priors)

        index = 0
        for j in xrange(nBasis):
            cand = norm.rvs(rho_draws[i-1][j],scale=candidate_sig2_rhos[j])
            if 0.0 < cand < 1.0:
                rho_vec[j] = cand

                R_matrix_after = construct_R_matrix(correlations,correlations,rho_vec)
                post_after = evaluate_posterior(R_matrix_after,rho_vec,energies,lambda_e_draws[i-1],lambda_z_draws[i-1],lam_e_prior,lam_z_prior,rho_priors)
                metrop_value = post_after - post_before
                unif = log(uniform.rvs(0,1))
                if metrop_value > unif:
                    rho_draws[i][j] = cand
                    post_before = post_after
                    accept_rhos[j] += 1
                else:
                    rho_vec[j] = rho_draws[i-1][j]



        R_matrix = construct_R_matrix(correlations,correlations,rho_vec)
        cand = norm.rvs(lambda_e_draws[i-1],scale=candidate_sig2_lambda_e)
        if cand > 0.0:
            post_after = evaluate_posterior(R_matrix,rho_vec,energies,cand,lambda_z_draws[i-1],lam_e_prior,lam_z_prior,rho_priors)

            metrop_value = post_after - post_before
            unif = log(uniform.rvs(0,1))
            if metrop_value > unif:
                lambda_e_draws[i] = cand
                post_before = post_after
                accept_lambda_e = accept_lambda_e + 1


        cand = norm.rvs(lambda_z_draws[i-1],scale=candidate_sig2_lambda_z)
        if cand > 0.0:
            post_after = evaluate_posterior(R_matrix,rho_vec,energies,lambda_e_draws[i],cand,lam_e_prior,lam_z_prior,rho_priors)
            metrop_value = post_after - post_before
            unif = log(uniform.rvs(0,1))
            if metrop_value > unif:
                lambda_z_draws[i] = cand
                post_before = post_after
                accept_lambda_z = accept_lambda_z + 1


    print accept_rhos/ndraws
    print accept_lambda_e/ndraws
    print accept_lambda_z/ndraws
    return [rho_draws,lambda_e_draws,lambda_z_draws]


def evaluate_posterior(R_matrix,rho_vec,energies,lambda_e,lambda_z,lam_e_prior,lam_z_prior,rho_prior_params):

    #    from scipy.linalg import solve
    #from numpy import allclose

    working_matrix = eye(len(R_matrix))/lambda_e + R_matrix/lambda_z
    logdet = get_log_determinant_of_matrix(working_matrix)

    x = solve(working_matrix,energies,sym_pos=True)
    #    if not allclose(dot(working_matrix,x),energies):
#        exit('solve routine didnt work')

    rho_priors = sum([loggbeta(rho_vec[j],rho_prior_params[j][0],rho_prior_params[j][1]) for j in xrange(len(rho_vec))])

    loggposterior = -.5 * logdet - .5*dot(energies,x) + logggamma(lambda_e,lam_e_prior[0],lam_e_prior[1]) + logggamma(lambda_z,lam_z_prior[0],lam_z_prior[1]) + rho_priors #(a_e-1)*log(lambda_e) - b_e*lambda_e + (a_z-1)*log(lambda_z) - b_z*lambda_z + rho_priors
    return loggposterior

def construct_R_matrix(listone,listtwo,rhos):

    return prod(rhos[:]**(4*(listone[:,newaxis]-listtwo)**2),axis=2)

(Once again... I don't know why It breaks my input up into several blocks when I post.. I hope you can decifer it)

952

asked Mar 19 '12 23:03

legoses

3 Answers

It is hard to tell exactly what's going on with your code, But my suspicion is that you just have some data which is not (or could not be) very vectorized. Because obviously the call of .rvs() 19000 times is going to be way slower than the .rvs(size=19000). See:

  In [5]: %timeit x=[scipy.stats.norm().rvs() for i in range(19000)]
  1 loops, best of 3: 1.23 s per loop

  In [6]: %timeit x=scipy.stats.norm().rvs(size=19000)
  1000 loops, best of 3: 1.67 ms per loop

So if you indeed have a not very vectorized code or algorithm it is well expected to be slower than fortran.

answered Nov 03 '22 06:11

sega_sai

Check out the performance page created by the SciPy/NumPy folks. There are a number of remarkably easy extras that foster very fast code. Among them are (a) using the weave module, especially the inline and blitz options. (b) Using Cython to write some of your functions in C but be able to call and use them in Python.

I do a lot of large-scale scientific computing work in Python for statistics, finance, and (in grad school) computer vision. The reason why Python is excellent for these kinds of issues is not that my naive, first hack code would yield the fastest solution, but because in Python I can easily interface with tons of other tasks. I can easily issue Linux commands for other programs, easily read and parse most data files, easily interface with SQL and other databse software; I have all of the R statistics library available, use of OpenCV commands (in much much nicer syntax that the C++ version), and much more.

When the importance of my task was to manipulate a new dataset and get my hands dirty, feeling out the nuances of that data, then Python's ease of programming, along with matplotlib, made it much better. Later on, when I need to scale things up, I can always use PyCUDA, Cython, or just rewrite things in C++ if high-end performance is required. Since most machines have multiprocessors now, the multiprocessing module, as well as mpi4py, allow me to quickly and cheaply turn annoying for-loop style tasks into much shorter tasks, without needing to migrate to C++.

In short, the real utility of Python doesn't come from the language all by itself, but from becoming really proficient with the add-ons and extras that let you cheaply make your little set of common problems execute quickly on the data sets that matter in the day-to-day.

Real-time embedded communications software is going to be using C++ for a long time to come... same for high-frequency trading strategies. But then again, professional solutions to these types of things is not really what Python is meant for. And in some cases, folks prefer unusual solutions for that stuff anyway.

answered Nov 03 '22 05:11

ely

Get rid of the two for loops and two list comprehensions by replacing them with Numpy functions and constructs that use numpy.ndarrays. Also do not print in between the computation - that is also slow. You can probably get 10-50 fold speed increase just by following the above advice.

Also see http://www.scipy.org/PerformancePython/

answered Nov 03 '22 05:11

peterhil

Related questions
                            
                                Unordered collection for unhashable objects?
                            
                                Iteratively parsing HTML (with lxml?)
                            
                                SQLAlchemy is Throwing an IntegrityError due to a DBSession.add()
                            
                                why is my text not aligning properly in wxPython?
                            
                                Binding <Return> to button is not working as expected
                            
                                Python Value in List
                            
                                python csv DictReader type
                            
                                Efficient ways to write a large NumPy array to a file
                            
                                Dynamically bind method to class instance in python
                            
                                python-mode documentation for Emacs
                            
                                Python Struct module behaves strange
                            
                                Is there any way to make simplejson less strict?
                            
                                python ctypes sending pointer to structure as parameter to native library
                            
                                Python module not working (not able to import) even after including it in PATH variable
                            
                                Apache 2 + mod_wsgi + WSGIScriptAlias
                            
                                Python replace using regex
                            
                                Python Generator Expression for Accumulating Dictionary Values
                            
                                Any good use for list/dict assignment during for loop?
                            
                                getting line-numbers that were changed
                            
                                Storing global config variables in a Pyramid project

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

numpy routines don't appear to be that fast

Tags:

performance

python

fortran

legoses

People also ask

3 Answers

sega_sai

ely

peterhil

Recent Activity

Donate For Us