I'm writing a scientific application in Python with a very processor-intensive loop at its core. I would like to optimise this as far as possible, at minimum inconvenience to end users, who will probably use it as an uncompiled collection of Python scripts, and will be using Windows, Mac, and (mainly Ubuntu) Linux.
It is currently written in Python with a dash of NumPy, and I've included the code below.
(If you're interested, the loop is to calculate the magnetic field at a given point inside a crystal by adding together the contributions of a large number of nearby magnetic ions, treated as tiny bar magnets. Basically, a massive sum of these.)
# calculate_dipole
# -------------------------
# calculate_dipole works out the dipole field at a given point within the crystal unit cell
# ---
# INPUT
# mu = position at which to calculate the dipole field
# r_i = array of atomic positions
# mom_i = corresponding array of magnetic moments
# ---
# OUTPUT
# B = the B-field at this point
def calculate_dipole(mu, r_i, mom_i):
relative = mu - r_i
r_unit = unit_vectors(relative)
#4pi / mu0 (at the front of the dipole eqn)
A = 1e-7
#initalise dipole field
B = zeros(3,float)
for i in range(len(relative)):
#work out the dipole field and add it to the estimate so far
B += A*(3*dot(mom_i[i],r_unit[i])*r_unit[i] - mom_i[i]) / sqrt(dot(relative[i],relative[i]))**3
return B
You can get this to run much, much faster if you eliminate the loop and use Numpy's vectorized operations. Put your data in numpy arrays of shape (3,N) and try the following:
import numpy as np
N = 20000
mu = np.random.random((3,1))
r_i = np.random.random((3,N))
mom_i = np.random.random((3,N))
def unit_vectors(r):
return r / np.sqrt((r*r).sum(0))
def calculate_dipole(mu, r_i, mom_i):
relative = mu - r_i
r_unit = unit_vectors(relative)
A = 1e-7
num = A*(3*np.sum(mom_i*r_unit, 0)*r_unit - mom_i)
den = np.sqrt(np.sum(relative*relative, 0))**3
B = np.sum(num/den, 1)
return B
This runs about 50 times faster for me than using a for loop.
Numpy does use some native optimization for array processing. You can use Numpy array with Cython to gain some speed-ups.
Your python code could probably be speeded up a bit by replacing your loop with a generator expression and removing all the lookups of mom_i[i], relative[i] and r_unit[i] by iterating through all three sequences in parallel using itertools.izip.
i.e. replace
B = zeros(3,float)
for i in range(len(relative)):
#work out the dipole field and add it to the estimate so far
B += A*(3*dot(mom_i[i],r_unit[i])*r_unit[i] - mom_i[i]) / sqrt(dot(relative[i],relative[i]))**3
return B
with:
from itertools import izip
...
return sum((A*(3*dot(mom,ru)*ru - mom) / sqrt(dot(rel,rel))**3
for mom, ru, rel in izip(mom_i, r_unit, relative)),
zeros(3,float))
This is also more readable IMHO since the core equation is not cluttered with [i] everywhere..
I suspect however that this will only get you marginal gains compared to doing the whole function in a compiled language such as Cython.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With