What is the most platform- and Python-version-independent way to make a fast loop for use in Python?

Question

I'm writing a scientific application in Python with a very processor-intensive loop at its core. I would like to optimise this as far as possible, at minimum inconvenience to end users, who will probably use it as an uncompiled collection of Python scripts, and will be using Windows, Mac, and (mainly Ubuntu) Linux.

It is currently written in Python with a dash of NumPy, and I've included the code below.

Is there a solution which would be reasonably fast which would not require compilation? This would seem to be the easiest way to maintain platform-independence.
If using something like Pyrex, which does require compilation, is there an easy way to bundle many modules and have Python choose between them depending on detected OS and Python version? Is there an easy way to build the collection of modules without needing access to every system with every version of Python?
Does one method lend itself particularly to multi-processor optimisation?

(If you're interested, the loop is to calculate the magnetic field at a given point inside a crystal by adding together the contributions of a large number of nearby magnetic ions, treated as tiny bar magnets. Basically, a massive sum of these.)

# calculate_dipole
# -------------------------
# calculate_dipole works out the dipole field at a given point within the crystal unit cell
# ---
# INPUT
# mu = position at which to calculate the dipole field
# r_i = array of atomic positions
# mom_i = corresponding array of magnetic moments
# ---
# OUTPUT
# B = the B-field at this point

def calculate_dipole(mu, r_i, mom_i):
    relative = mu - r_i
    r_unit = unit_vectors(relative)
    #4pi / mu0 (at the front of the dipole eqn)
    A = 1e-7
    #initalise dipole field
    B = zeros(3,float)

    for i in range(len(relative)):
        #work out the dipole field and add it to the estimate so far
        B += A*(3*dot(mom_i[i],r_unit[i])*r_unit[i] - mom_i[i]) / sqrt(dot(relative[i],relative[i]))**3
    return B

Ray · Accepted Answer

You can get this to run much, much faster if you eliminate the loop and use Numpy's vectorized operations. Put your data in numpy arrays of shape (3,N) and try the following:

import numpy as np

N = 20000
mu = np.random.random((3,1))
r_i = np.random.random((3,N))
mom_i = np.random.random((3,N))

def unit_vectors(r):
     return r / np.sqrt((r*r).sum(0))

def calculate_dipole(mu, r_i, mom_i):
    relative = mu - r_i
    r_unit = unit_vectors(relative)
    A = 1e-7

    num = A*(3*np.sum(mom_i*r_unit, 0)*r_unit - mom_i)
    den = np.sqrt(np.sum(relative*relative, 0))**3
    B = np.sum(num/den, 1)
    return B

This runs about 50 times faster for me than using a for loop.

whatnick · Answer

Numpy does use some native optimization for array processing. You can use Numpy array with Cython to gain some speed-ups.

Dave Kirby · Answer

Your python code could probably be speeded up a bit by replacing your loop with a generator expression and removing all the lookups of mom_i[i], relative[i] and r_unit[i] by iterating through all three sequences in parallel using itertools.izip.

i.e. replace

B = zeros(3,float)

for i in range(len(relative)):
    #work out the dipole field and add it to the estimate so far
    B += A*(3*dot(mom_i[i],r_unit[i])*r_unit[i] - mom_i[i]) / sqrt(dot(relative[i],relative[i]))**3
return B

with:

from itertools import izip
...
return sum((A*(3*dot(mom,ru)*ru - mom) / sqrt(dot(rel,rel))**3 
            for mom, ru, rel in izip(mom_i, r_unit, relative)),
           zeros(3,float))

This is also more readable IMHO since the core equation is not cluttered with [i] everywhere..

I suspect however that this will only get you marginal gains compared to doing the whole function in a compiled language such as Cython.

What is the most platform- and Python-version-independent way to make a fast loop for use in Python?

Tags:

python

optimization

Andrew Steele

3 Answers

Ray

whatnick

Dave Kirby

Recent Activity

Donate For Us

What is the most platform- and Python-version-independent way to make a fast loop for use in Python?

Tags:

python

optimization

Andrew Steele

3 Answers

Ray

whatnick

Dave Kirby

Related questions

Recent Activity

Donate For Us