For an exercise in university, we had to implement a Leapfrog integrator with exact Newtonian forces in Python. The course is over and our solutions were all more than good enough, but I wondered if/how one could improve the performance of the force calculation even more.
The bottleneck is to calculate all forces (aka accelerations):
for a large (1000 and bigger) number of particles N (i,j<N).
Here r1 and r2 are the 3-dimensional vectors of the position of the particles stored in an ndarray of shape (N,3) and Gm is the particle mass times the gravitational constant which I saved in an ndarray of shape (N).
The fastest version I found so far is the following:
def a(self):
sep = self.r[np.newaxis, :] - self.r[:, np.newaxis]
# Calculate the distances between all particles with cdist
# this is much faster than by hand
dists = cdist(self.r, self.r)
scale =dists*dists*dists
# set diagonal elements of dist to something != 0, to avoid division by 0
np.fill_diagonal(scale,1)
Fsum = (sep/scale.reshape(self.particlenr,self.particlenr,1))*self.Gm[:,None]
return np.add.reduce(Fsum,axis=1)
But it bugs me that this is probably not the fastest version. The first line seems to be too slow when comparing to cdist which is doing essentially the same calculation. Also, this solution does not use the symmetry of switching r1 and r2 in the problem and calculates all elements twice.
Do you know any performance improvements (without changing the force-calculation to some approximation or changing the programming language)?
I'll give it a try: I implemented a routine, which determines a single a_i
:
import numpy as np
GM = .01 # article mass times the gravitation
def calc_a_i(rr, i):
""" Calculate one a_i """
drr = rr - rr[i, :] # r_j - r_i
dr3 = np.linalg.norm(drr, axis=1)**3 # |r_j - r_i|**3
dr3[i] = 1 # case i==j: drr = [0, 0, 0]
# this would be more robust (elimnate small denominators):
#dr3 = np.where(np.abs(dr3) > 1e-12, dr3, 1)
return np.sum(drr.T/dr3, axis=1)
n = 4000 # number of particles
rr = np.random.randn(n, 3) # generate some particles
# Calculate each a_i separately:
aa = np.array([calc_a_i(rr, i) for i in range(n)]) * GM # all a_i
To test it, I ran:
In [1]: %timeit aa = np.array([calc_a_i(rr, i) for i in range(n)])
1 loops, best of 3: 2.93 s per loop
The easiest way to speed up such code, is to use numexpr for faster evaluation of array expressions:
import numexpr as ne
ne.set_num_threads(1) # multithreading causes to much overhead
def ne_calc_a_i( i):
""" Use numexpr - here rr is global for easier parallelization"""
dr1, dr2, dr3 = (rr - rr[i, :]).T # r_j - r_i
drrp3 = ne.evaluate("sqrt(dr1**2 + dr2**2 + dr3**2)**3")
drrp3[i] = 1
return np.sum(np.vstack([dr1, dr2, dr3])/drrp3, axis=1)
# Calculate each a_i separately:
aa_ne = np.array([ne_calc_a_i(i) for i in range(n)]) * GM # all a_i
This improves the speed by a factor of 2:
In [2]: %timeit aa_ne = np.array([ne_calc_a_i(i) for i in range(n)])
1 loops, best of 3: 1.29 s per loop
To speed up the code further, let's run it on a IPython Cluster:
# Start local cluster with 4 clients in a shell with:
# ipcluster start -n 4
rc = Client() # clients of cluster
dview = rc[:] # view of clusters
dview.execute("import numpy as np") # import libraries on clients
dview.execute("import numexpr as ne")
dview.execute("ne.set_num_threads(1)")
def para_calc_a(dview, rr):
""" Only in function for %timeit """
# send rr and ne_calc_a_i() to clients:
dview.push(dict(rr=rr, ne_calc_a_i=ne_calc_a_i), block=True)
return np.array(dview.map_sync(ne_calc_a_i, range(n)))*GM
The speedup is more than a factor of four:
In[3] %timeit aa_p = para_calc_a(dview, rr)
1 loops, best of 3: 612 ms per loop
As @mathdan already noted, it is not obvious how to optimize such a problem: It depends on your CPU architecture if the memory bus or the floating point unit is the limiting factor, which calls for different techniques.
For further gains you might want to look at Theano: It can dynamically generate code GPU code from Python.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With