I'm using python to do some Bayesian statistics. I've coded it up in python and in Fortran 95. The Fortran code is waaay faster... like a factor of 100. I expected the Fortran to be faster, but I was really hoping that by using numpy I could get the python code to come close, maybe within a factor of 2. I've profiled the python code and it looks like the majority of the time is spent doing the following things:
scipy.stats.rvs: taking a random draw from a distribution. I do this ~19000 times and it takes a total time of 3.552 sec
numpy.slogdet: computing the log of the determinant of a matrix. I do this ~10,000 and it takes a total of 2.48 s
numpy.solve: solve a linear system: I call this routine ~10,000 times for a total time of 2.557 s
In total my code runs in ~ 11 sec whereas my fortran code takes .092 sec. Is this a joke? I'm really not trying to be unrealistic in my expectations of python, and I certainly don't expect to get my python code to be as fast as Fortran.. but to be slower by a factor of > 100.. Python's gotta be able to do better than that. Just in case you are curious, here is the full output of my profiler:( I don't know why it broke the text into several blocks)
1290611 function calls in 11.296 CPU seconds
Ordered by: internal time, function name
ncalls tottime percall cumtime percall filename:lineno(function)
18973 0.864 0.000 3.552 0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:484(rvs)
9976 0.819 0.000 2.480 0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:1559(slogdet)
9976 0.627 0.000 6.659 0.001 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:77(evaluate_posterior)
9384 0.591 0.000 0.753 0.000 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:39(construct_R_matrix)
77852 0.533 0.000 0.533 0.000 :0(array)
37946 0.520 0.000 1.489 0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:32(_wrapit)
77851 0.423 0.000 0.956 0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:216(asarray)
37946 0.360 0.000 0.360 0.000 :0(all)
9976 0.335 0.000 2.557 0.000 /usr/lib64/python2.6/sitepackages/scipy/linalg/basic.py:23(solve)
107799 0.322 0.000 0.322 0.000 :0(len)
109740 0.301 0.000 0.301 0.000 :0(issubclass)
28357 0.294 0.000 0.294 0.000 :0(prod)
9976 0.287 0.000 0.957 0.000 /usr/lib64/python2.6/site-packages/scipy/linalg/lapack.py:45(find_best_lapack_type)
1 0.282 0.282 11.294 11.294 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:199(get_rho_lambda_draws)
9976 0.269 0.000 1.386 0.000 /usr/lib64/python2.6/site-packages/scipy/linalg/lapack.py:60(get_lapack_funcs)
19952 0.263 0.000 0.476 0.000 /usr/lib64/python2.6/site-packages/scipy/linalg/lapack.py:23(cast_to_lapack_prefix)
19952 0.235 0.000 0.669 0.000 /usr/lib64/python2.6/site-packages/numpy/lib/function_base.py:483(asarray_chkfinite)
66833 0.212 0.000 0.212 0.000 :0(log)
18973 0.207 0.000 1.054 0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1427(product)
29931 0.205 0.000 0.205 0.000 :0(reduce)
28949 0.187 0.000 0.856 0.000 :0(map)
9976 0.175 0.000 0.175 0.000 :0(dot)
47922 0.163 0.000 0.163 0.000 :0(getattr)
9976 0.157 0.000 0.206 0.000 /usr/lib64/python2.6/site-packages/numpy/lib/twodim_base.py:169(eye)
19952 0.154 0.000 0.271 0.000 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:32(loggbeta)
18973 0.151 0.000 0.793 0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1548(all)
19953 0.146 0.000 0.146 0.000 :0(any)
9976 0.142 0.000 0.316 0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:99(_commonType)
9976 0.133 0.000 0.133 0.000 :0(dgetrf)
18973 0.125 0.000 0.175 0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:462(_fix_loc_scale)
39904 0.117 0.000 0.117 0.000 :0(append)
18973 0.105 0.000 0.292 0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1461(alltrue)
19952 0.102 0.000 0.102 0.000 :0(zeros)
19952 0.093 0.000 0.154 0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:71(isComplexType)
19952 0.090 0.000 0.090 0.000 :0(split)
9976 0.089 0.000 2.569 0.000 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:62(get_log_determinant_of_matrix)
19952 0.087 0.000 0.134 0.000 /bluehome/legoses/bce/bayes_GP_integrated_out/python/ce_funcs.py:35(logggamma)
9976 0.083 0.000 0.154 0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:139(_fastCopyAndTranspose)
9976 0.076 0.000 0.125 0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:157(_assertSquareness)
9976 0.074 0.000 0.097 0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:151(_assertRank2)
9976 0.072 0.000 0.119 0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:127(_to_native_byte_order)
18973 0.072 0.000 0.072 0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:832(_argcheck)
9976 0.072 0.000 0.228 0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:901(diagonal)
9976 0.070 0.000 0.070 0.000 :0(arange)
9976 0.061 0.000 0.061 0.000 :0(diagonal)
9976 0.055 0.000 0.055 0.000 :0(sum)
9976 0.053 0.000 0.075 0.000 /usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py:84(_realType)
11996 0.050 0.000 0.091 0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:1412(_rvs)
9384 0.047 0.000 0.162 0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1898(prod)
9976 0.045 0.000 0.045 0.000 :0(sort)
11996 0.041 0.000 0.041 0.000 :0(standard_normal)
9976 0.037 0.000 0.037 0.000 :0(_fastCopyAndTranspose)
9976 0.037 0.000 0.037 0.000 :0(hasattr)
9976 0.037 0.000 0.037 0.000 :0(range)
6977 0.034 0.000 0.055 0.000 /usr/lib64/python2.6/site-packages/scipy/stats/distributions.py:3731(_rvs)
9977 0.027 0.000 0.027 0.000 :0(max)
9976 0.023 0.000 0.023 0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:498(isfortran)
9977 0.022 0.000 0.022 0.000 :0(min)
9976 0.022 0.000 0.022 0.000 :0(get)
6977 0.021 0.000 0.021 0.000 :0(uniform)
1 0.001 0.001 11.295 11.295 <string>:1(<module>)
1 0.001 0.001 11.296 11.296 profile:0(get_rho_lambda_draws(correlations,energies,rho_priors,lambda_e_prior,lambda_z_prior,candidate_sig2_rhos,candidate_sig2_lambda_e,candidate_sig2_lambda_z,3000))
2 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:445(__call__)
1 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:385(__init__)
1 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:175(_array2string)
2 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:475(_digits)
2 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:309(_extendLine)
1 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:317(_formatArray)
1 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1477(any)
1 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:243(array2string)
1 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:1390(array_str)
1 0.000 0.000 0.000 0.000 :0(compress)
1 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/arrayprint.py:394(fillFormat)
6 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:2166(geterr)
12 0.000 0.000 0.000 0.000 :0(geterrobj)
0 0.000 0.000 profile:0(profiler)
1 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:1043(ravel)
1 0.000 0.000 0.000 0.000 :0(ravel)
8 0.000 0.000 0.000 0.000 :0(rstrip)
6 0.000 0.000 0.000 0.000 /usr/lib64/python2.6/site-packages/numpy/core/numeric.py:2070(seterr)
6 0.000 0.000 0.000 0.000 :0(seterrobj)
1 0.000 0.000 0.000 0.000 :0(setprofile)
EDIT:
Here is copy of the relevant routines
def get_rho_lambda_draws(correlations, energies, rho_priors, lam_e_prior, lam_z_prior,
candidate_sig2_rhos, candidate_sig2_lambda_e,
candidate_sig2_lambda_z, ndraws):
nBasis = len(correlations[0])
nStruct = len(correlations)
rho _draws = [ [0.5 for x in xrange(nBasis)] for y in xrange(ndraws)]
lambda_e_draws = [ 5 for x in xrange(ndraws)]
lambda_z_draws = [ 5 for x in xrange(ndraws)]
accept_rhos = array([0. for x in xrange(nBasis)])
accept_lambda_e = 0.
accept_lambda_z = 0.
for i in xrange(1,ndraws):
if i % 100 == 0:
print i, "REP<---------------------------------------------------------------------------------"
#do metropolis to get rho
rho_draws[i] = [x for x in rho_draws[i-1]]
lambda_e_draws[i] = lambda_e_draws[i-1]
lambda_z_draws[i] = lambda_z_draws[i-1]
rho_vec = [x for x in rho_draws[i-1]]
R_matrix_before =construct_R_matrix(correlations,correlations,rho_vec)
post_before = evaluate_posterior(R_matrix_before,rho_vec,energies,lambda_e_draws[i-1],lambda_z_draws[i-1],lam_e_prior,lam_z_prior,rho_priors)
index = 0
for j in xrange(nBasis):
cand = norm.rvs(rho_draws[i-1][j],scale=candidate_sig2_rhos[j])
if 0.0 < cand < 1.0:
rho_vec[j] = cand
R_matrix_after = construct_R_matrix(correlations,correlations,rho_vec)
post_after = evaluate_posterior(R_matrix_after,rho_vec,energies,lambda_e_draws[i-1],lambda_z_draws[i-1],lam_e_prior,lam_z_prior,rho_priors)
metrop_value = post_after - post_before
unif = log(uniform.rvs(0,1))
if metrop_value > unif:
rho_draws[i][j] = cand
post_before = post_after
accept_rhos[j] += 1
else:
rho_vec[j] = rho_draws[i-1][j]
R_matrix = construct_R_matrix(correlations,correlations,rho_vec)
cand = norm.rvs(lambda_e_draws[i-1],scale=candidate_sig2_lambda_e)
if cand > 0.0:
post_after = evaluate_posterior(R_matrix,rho_vec,energies,cand,lambda_z_draws[i-1],lam_e_prior,lam_z_prior,rho_priors)
metrop_value = post_after - post_before
unif = log(uniform.rvs(0,1))
if metrop_value > unif:
lambda_e_draws[i] = cand
post_before = post_after
accept_lambda_e = accept_lambda_e + 1
cand = norm.rvs(lambda_z_draws[i-1],scale=candidate_sig2_lambda_z)
if cand > 0.0:
post_after = evaluate_posterior(R_matrix,rho_vec,energies,lambda_e_draws[i],cand,lam_e_prior,lam_z_prior,rho_priors)
metrop_value = post_after - post_before
unif = log(uniform.rvs(0,1))
if metrop_value > unif:
lambda_z_draws[i] = cand
post_before = post_after
accept_lambda_z = accept_lambda_z + 1
print accept_rhos/ndraws
print accept_lambda_e/ndraws
print accept_lambda_z/ndraws
return [rho_draws,lambda_e_draws,lambda_z_draws]
def evaluate_posterior(R_matrix,rho_vec,energies,lambda_e,lambda_z,lam_e_prior,lam_z_prior,rho_prior_params):
# from scipy.linalg import solve
#from numpy import allclose
working_matrix = eye(len(R_matrix))/lambda_e + R_matrix/lambda_z
logdet = get_log_determinant_of_matrix(working_matrix)
x = solve(working_matrix,energies,sym_pos=True)
# if not allclose(dot(working_matrix,x),energies):
# exit('solve routine didnt work')
rho_priors = sum([loggbeta(rho_vec[j],rho_prior_params[j][0],rho_prior_params[j][1]) for j in xrange(len(rho_vec))])
loggposterior = -.5 * logdet - .5*dot(energies,x) + logggamma(lambda_e,lam_e_prior[0],lam_e_prior[1]) + logggamma(lambda_z,lam_z_prior[0],lam_z_prior[1]) + rho_priors #(a_e-1)*log(lambda_e) - b_e*lambda_e + (a_z-1)*log(lambda_z) - b_z*lambda_z + rho_priors
return loggposterior
def construct_R_matrix(listone,listtwo,rhos):
return prod(rhos[:]**(4*(listone[:,newaxis]-listtwo)**2),axis=2)
(Once again... I don't know why It breaks my input up into several blocks when I post.. I hope you can decifer it)
NumPy is fast because it can do all its calculations without calling back into Python. Since this function involves looping in Python, we lose all the performance benefits of using NumPy. For a 10,000,000-entry NumPy array, this functions takes 2.5 seconds to run on my computer.
however then again, the use of Numpy and/or Scipy with Python may additionally get your code to run just as fast as a native C++ software (in which the code in C++ could sometimes take longer to broaden).
Numpy is restricted to single-threading in many cases, it requires to manually vectorization of code which can become hard to read, and is much slower than the two alternatives. Fortran offers great performance with pretty simple code.
Numba Sort is Significantly Slower Than NumPy Sort.
It is hard to tell exactly what's going on with your code, But my suspicion is that you just have some data which is not (or could not be) very vectorized. Because obviously the call of .rvs() 19000 times is going to be way slower than the .rvs(size=19000). See:
In [5]: %timeit x=[scipy.stats.norm().rvs() for i in range(19000)]
1 loops, best of 3: 1.23 s per loop
In [6]: %timeit x=scipy.stats.norm().rvs(size=19000)
1000 loops, best of 3: 1.67 ms per loop
So if you indeed have a not very vectorized code or algorithm it is well expected to be slower than fortran.
Check out the performance page created by the SciPy/NumPy folks. There are a number of remarkably easy extras that foster very fast code. Among them are (a) using the weave
module, especially the inline
and blitz
options. (b) Using Cython to write some of your functions in C but be able to call and use them in Python.
I do a lot of large-scale scientific computing work in Python for statistics, finance, and (in grad school) computer vision. The reason why Python is excellent for these kinds of issues is not that my naive, first hack code would yield the fastest solution, but because in Python I can easily interface with tons of other tasks. I can easily issue Linux commands for other programs, easily read and parse most data files, easily interface with SQL and other databse software; I have all of the R statistics library available, use of OpenCV commands (in much much nicer syntax that the C++ version), and much more.
When the importance of my task was to manipulate a new dataset and get my hands dirty, feeling out the nuances of that data, then Python's ease of programming, along with matplotlib, made it much better. Later on, when I need to scale things up, I can always use PyCUDA, Cython, or just rewrite things in C++ if high-end performance is required. Since most machines have multiprocessors now, the multiprocessing module, as well as mpi4py, allow me to quickly and cheaply turn annoying for-loop style tasks into much shorter tasks, without needing to migrate to C++.
In short, the real utility of Python doesn't come from the language all by itself, but from becoming really proficient with the add-ons and extras that let you cheaply make your little set of common problems execute quickly on the data sets that matter in the day-to-day.
Real-time embedded communications software is going to be using C++ for a long time to come... same for high-frequency trading strategies. But then again, professional solutions to these types of things is not really what Python is meant for. And in some cases, folks prefer unusual solutions for that stuff anyway.
Get rid of the two for loops and two list comprehensions by replacing them with Numpy functions and constructs that use numpy.ndarrays. Also do not print in between the computation - that is also slow. You can probably get 10-50 fold speed increase just by following the above advice.
Also see http://www.scipy.org/PerformancePython/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With