Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy routines don't appear to be that fast

I'm using python to do some Bayesian statistics. I've coded it up in python and in Fortran 95. The Fortran code is waaay faster... like a factor of 100. I expected the Fortran to be faster, but I was really hoping that by using numpy I could get the python code to come close, maybe within a factor of 2. I've profiled the python code and it looks like the majority of the time is spent doing the following things:

scipy.stats.rvs: taking a random draw from a distribution. I do this ~19000 times and it takes a total time of 3.552 sec

numpy.slogdet: computing the log of the determinant of a matrix. I do this ~10,000 and it takes a total of 2.48 s

numpy.solve: solve a linear system: I call this routine ~10,000 times for a total time of 2.557 s

In total my code runs in ~ 11 sec whereas my fortran code takes .092 sec. Is this a joke? I'm really not trying to be unrealistic in my expectations of python, and I certainly don't expect to get my python code to be as fast as Fortran.. but to be slower by a factor of > 100.. Python's gotta be able to do better than that. Just in case you are curious, here is the full output of my profiler:( I don't know why it broke the text into several blocks)

     1290611 function calls in 11.296 CPU seconds

Ordered by: internal time, function name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)

18973    0.864    0.000    3.552    0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:484(rvs)
 9976    0.819    0.000    2.480    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:1559(slogdet)
 9976    0.627    0.000    6.659    0.001 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:77(evaluate_posterior)
 9384    0.591    0.000    0.753    0.000 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:39(construct_R_matrix)
77852    0.533    0.000    0.533    0.000 :0(array)
37946    0.520    0.000    1.489    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:32(_wrapit)
77851    0.423    0.000    0.956    0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:216(asarray)
37946    0.360    0.000    0.360    0.000 :0(all)
 9976    0.335    0.000    2.557    0.000 /usr/lib64/python2.6/sitepackages/scipy/linalg/basic.py:23(solve)
107799    0.322    0.000    0.322    0.000 :0(len)

109740    0.301    0.000    0.301    0.000 :0(issubclass)

28357    0.294    0.000    0.294    0.000 :0(prod)
 9976    0.287    0.000    0.957    0.000 /usr/lib64/python2.6/site-packages/scipy/linalg/lapack.py:45(find_best_lapack_type)
    1    0.282    0.282   11.294   11.294 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:199(get_rho_lambda_draws)
 9976    0.269    0.000    1.386    0.000 /usr/lib64/python2.6/site-packages/scipy/linalg/lapack.py:60(get_lapack_funcs)
19952    0.263    0.000    0.476    0.000 /usr/lib64/python2.6/site-packages/scipy/linalg/lapack.py:23(cast_to_lapack_prefix)
19952    0.235    0.000    0.669    0.000 /usr/lib64/python2.6/site-packages/numpy/lib/function_base.py:483(asarray_chkfinite)
66833    0.212    0.000    0.212    0.000 :0(log)
18973    0.207    0.000    1.054    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1427(product)
29931    0.205    0.000    0.205    0.000 :0(reduce)
28949    0.187    0.000    0.856    0.000 :0(map)
 9976    0.175    0.000    0.175    0.000 :0(dot)
47922    0.163    0.000    0.163    0.000 :0(getattr)
 9976    0.157    0.000    0.206    0.000 /usr/lib64/python2.6/site-packages/numpy/lib/twodim_base.py:169(eye)
19952    0.154    0.000    0.271    0.000 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:32(loggbeta)
18973    0.151    0.000    0.793    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1548(all)
19953    0.146    0.000    0.146    0.000 :0(any)
 9976    0.142    0.000    0.316    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:99(_commonType)
 9976    0.133    0.000    0.133    0.000 :0(dgetrf)
18973    0.125    0.000    0.175    0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:462(_fix_loc_scale)
39904    0.117    0.000    0.117    0.000 :0(append)
18973    0.105    0.000    0.292    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1461(alltrue)
19952    0.102    0.000    0.102    0.000 :0(zeros)
19952    0.093    0.000    0.154    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:71(isComplexType)
19952    0.090    0.000    0.090    0.000 :0(split)
 9976    0.089    0.000    2.569    0.000 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:62(get_log_determinant_of_matrix)
19952    0.087    0.000    0.134    0.000 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:35(logggamma)
 9976    0.083    0.000    0.154    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:139(_fastCopyAndTranspose)
 9976    0.076    0.000    0.125    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:157(_assertSquareness)
 9976    0.074    0.000    0.097    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:151(_assertRank2)
 9976    0.072    0.000    0.119    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:127(_to_native_byte_order)
18973    0.072    0.000    0.072    0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:832(_argcheck)
 9976    0.072    0.000    0.228    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:901(diagonal)
 9976    0.070    0.000    0.070    0.000 :0(arange)
 9976    0.061    0.000    0.061    0.000 :0(diagonal)
 9976    0.055    0.000    0.055    0.000 :0(sum)
 9976    0.053    0.000    0.075    0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:84(_realType)
11996    0.050    0.000    0.091    0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:1412(_rvs)
 9384    0.047    0.000    0.162    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1898(prod)
 9976    0.045    0.000    0.045    0.000 :0(sort)
11996    0.041    0.000    0.041    0.000 :0(standard_normal)
 9976    0.037    0.000    0.037    0.000 :0(_fastCopyAndTranspose)
 9976    0.037    0.000    0.037    0.000 :0(hasattr)
 9976    0.037    0.000    0.037    0.000 :0(range)
 6977    0.034    0.000    0.055    0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:3731(_rvs)
 9977    0.027    0.000    0.027    0.000 :0(max)
 9976    0.023    0.000    0.023    0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:498(isfortran)
 9977    0.022    0.000    0.022    0.000 :0(min)
 9976    0.022    0.000    0.022    0.000 :0(get)
 6977    0.021    0.000    0.021    0.000 :0(uniform)
    1    0.001    0.001   11.295   11.295 <string>:1(<module>)
    1    0.001    0.001   11.296   11.296 profile:0(get_rho_lambda_draws(correlations,energies,rho_priors,lambda_e_prior,lambda_z_prior,candidate_sig2_rhos,candidate_sig2_lambda_e,candidate_sig2_lambda_z,3000))
    2    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:445(__call__)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:385(__init__)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:175(_array2string)
    2    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:475(_digits)
    2    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:309(_extendLine)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:317(_formatArray)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1477(any)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:243(array2string)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:1390(array_str)
    1    0.000    0.000    0.000    0.000 :0(compress)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:394(fillFormat)
    6    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:2166(geterr)
   12    0.000    0.000    0.000    0.000 :0(geterrobj)
    0    0.000             0.000          profile:0(profiler)
    1    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1043(ravel)
    1    0.000    0.000    0.000    0.000 :0(ravel)
    8    0.000    0.000    0.000    0.000 :0(rstrip)
    6    0.000    0.000    0.000    0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:2070(seterr)
    6    0.000    0.000    0.000    0.000 :0(seterrobj)
    1    0.000    0.000    0.000    0.000 :0(setprofile)

EDIT:

Here is copy of the relevant routines

def get_rho_lambda_draws(correlations, energies, rho_priors, lam_e_prior, lam_z_prior,  
                         candidate_sig2_rhos, candidate_sig2_lambda_e, 
                         candidate_sig2_lambda_z, ndraws):

    nBasis = len(correlations[0])
    nStruct = len(correlations)

    rho _draws = [ [0.5 for x in xrange(nBasis)] for y in xrange(ndraws)]
    lambda_e_draws = [ 5 for x in xrange(ndraws)]
    lambda_z_draws = [ 5 for x in xrange(ndraws)]
            
    accept_rhos = array([0. for x in xrange(nBasis)])
    accept_lambda_e = 0.
    accept_lambda_z = 0.

    for i in xrange(1,ndraws):
        if i % 100 == 0:
            print i, "REP<---------------------------------------------------------------------------------"
        #do metropolis to get rho
        rho_draws[i] = [x for x in rho_draws[i-1]]
        lambda_e_draws[i] = lambda_e_draws[i-1]
        lambda_z_draws[i] = lambda_z_draws[i-1]

        rho_vec = [x for x in rho_draws[i-1]]
        R_matrix_before =construct_R_matrix(correlations,correlations,rho_vec)
        post_before = evaluate_posterior(R_matrix_before,rho_vec,energies,lambda_e_draws[i-1],lambda_z_draws[i-1],lam_e_prior,lam_z_prior,rho_priors)

        index = 0
        for j in xrange(nBasis):
            cand = norm.rvs(rho_draws[i-1][j],scale=candidate_sig2_rhos[j])
            if 0.0 < cand < 1.0:
                rho_vec[j] = cand

                R_matrix_after = construct_R_matrix(correlations,correlations,rho_vec)
                post_after = evaluate_posterior(R_matrix_after,rho_vec,energies,lambda_e_draws[i-1],lambda_z_draws[i-1],lam_e_prior,lam_z_prior,rho_priors)
                metrop_value = post_after - post_before
                unif = log(uniform.rvs(0,1))
                if metrop_value > unif:
                    rho_draws[i][j] = cand
                    post_before = post_after
                    accept_rhos[j] += 1
                else:
                    rho_vec[j] = rho_draws[i-1][j]



        R_matrix = construct_R_matrix(correlations,correlations,rho_vec)
        cand = norm.rvs(lambda_e_draws[i-1],scale=candidate_sig2_lambda_e)
        if cand > 0.0:
            post_after = evaluate_posterior(R_matrix,rho_vec,energies,cand,lambda_z_draws[i-1],lam_e_prior,lam_z_prior,rho_priors)

            metrop_value = post_after - post_before
            unif = log(uniform.rvs(0,1))
            if metrop_value > unif:
                lambda_e_draws[i] = cand
                post_before = post_after
                accept_lambda_e = accept_lambda_e + 1


        cand = norm.rvs(lambda_z_draws[i-1],scale=candidate_sig2_lambda_z)
        if cand > 0.0:
            post_after = evaluate_posterior(R_matrix,rho_vec,energies,lambda_e_draws[i],cand,lam_e_prior,lam_z_prior,rho_priors)
            metrop_value = post_after - post_before
            unif = log(uniform.rvs(0,1))
            if metrop_value > unif:
                lambda_z_draws[i] = cand
                post_before = post_after
                accept_lambda_z = accept_lambda_z + 1


    print accept_rhos/ndraws
    print accept_lambda_e/ndraws
    print accept_lambda_z/ndraws
    return [rho_draws,lambda_e_draws,lambda_z_draws]


def evaluate_posterior(R_matrix,rho_vec,energies,lambda_e,lambda_z,lam_e_prior,lam_z_prior,rho_prior_params):

    #    from scipy.linalg import solve
    #from numpy import allclose

    working_matrix = eye(len(R_matrix))/lambda_e + R_matrix/lambda_z
    logdet = get_log_determinant_of_matrix(working_matrix)

    x = solve(working_matrix,energies,sym_pos=True)
    #    if not allclose(dot(working_matrix,x),energies):
#        exit('solve routine didnt work')

    rho_priors = sum([loggbeta(rho_vec[j],rho_prior_params[j][0],rho_prior_params[j][1]) for j in xrange(len(rho_vec))])

    loggposterior = -.5 * logdet - .5*dot(energies,x) + logggamma(lambda_e,lam_e_prior[0],lam_e_prior[1]) + logggamma(lambda_z,lam_z_prior[0],lam_z_prior[1]) + rho_priors #(a_e-1)*log(lambda_e) - b_e*lambda_e + (a_z-1)*log(lambda_z) - b_z*lambda_z + rho_priors
    return loggposterior

def construct_R_matrix(listone,listtwo,rhos):

    return prod(rhos[:]**(4*(listone[:,newaxis]-listtwo)**2),axis=2)

(Once again... I don't know why It breaks my input up into several blocks when I post.. I hope you can decifer it)

like image 952
legoses Avatar asked Mar 19 '12 23:03

legoses


People also ask

Why NumPy Where is fast?

NumPy is fast because it can do all its calculations without calling back into Python. Since this function involves looping in Python, we lose all the performance benefits of using NumPy. For a 10,000,000-entry NumPy array, this functions takes 2.5 seconds to run on my computer.

Is NumPy as fast as C++?

however then again, the use of Numpy and/or Scipy with Python may additionally get your code to run just as fast as a native C++ software (in which the code in C++ could sometimes take longer to broaden).

Is NumPy faster than Fortran?

Numpy is restricted to single-threading in many cases, it requires to manually vectorization of code which can become hard to read, and is much slower than the two alternatives. Fortran offers great performance with pretty simple code.

Is Numba faster than NumPy?

Numba Sort is Significantly Slower Than NumPy Sort.


3 Answers

It is hard to tell exactly what's going on with your code, But my suspicion is that you just have some data which is not (or could not be) very vectorized. Because obviously the call of .rvs() 19000 times is going to be way slower than the .rvs(size=19000). See:

  In [5]: %timeit x=[scipy.stats.norm().rvs() for i in range(19000)]
  1 loops, best of 3: 1.23 s per loop

  In [6]: %timeit x=scipy.stats.norm().rvs(size=19000)
  1000 loops, best of 3: 1.67 ms per loop

So if you indeed have a not very vectorized code or algorithm it is well expected to be slower than fortran.

like image 60
sega_sai Avatar answered Nov 03 '22 06:11

sega_sai


Check out the performance page created by the SciPy/NumPy folks. There are a number of remarkably easy extras that foster very fast code. Among them are (a) using the weave module, especially the inline and blitz options. (b) Using Cython to write some of your functions in C but be able to call and use them in Python.

I do a lot of large-scale scientific computing work in Python for statistics, finance, and (in grad school) computer vision. The reason why Python is excellent for these kinds of issues is not that my naive, first hack code would yield the fastest solution, but because in Python I can easily interface with tons of other tasks. I can easily issue Linux commands for other programs, easily read and parse most data files, easily interface with SQL and other databse software; I have all of the R statistics library available, use of OpenCV commands (in much much nicer syntax that the C++ version), and much more.

When the importance of my task was to manipulate a new dataset and get my hands dirty, feeling out the nuances of that data, then Python's ease of programming, along with matplotlib, made it much better. Later on, when I need to scale things up, I can always use PyCUDA, Cython, or just rewrite things in C++ if high-end performance is required. Since most machines have multiprocessors now, the multiprocessing module, as well as mpi4py, allow me to quickly and cheaply turn annoying for-loop style tasks into much shorter tasks, without needing to migrate to C++.

In short, the real utility of Python doesn't come from the language all by itself, but from becoming really proficient with the add-ons and extras that let you cheaply make your little set of common problems execute quickly on the data sets that matter in the day-to-day.

Real-time embedded communications software is going to be using C++ for a long time to come... same for high-frequency trading strategies. But then again, professional solutions to these types of things is not really what Python is meant for. And in some cases, folks prefer unusual solutions for that stuff anyway.

like image 5
ely Avatar answered Nov 03 '22 05:11

ely


Get rid of the two for loops and two list comprehensions by replacing them with Numpy functions and constructs that use numpy.ndarrays. Also do not print in between the computation - that is also slow. You can probably get 10-50 fold speed increase just by following the above advice.

Also see http://www.scipy.org/PerformancePython/

like image 1
peterhil Avatar answered Nov 03 '22 05:11

peterhil