improve nested loop performance

Question

I am doing a molecular dynamics simulation of an Argon liquid in Python. I have a stable version running, however it runs slowly for more than 100 atoms. I identified the bottleneck to be the following nested for loop. It's a force calculation put inside a function that is called from my main.py script:

def computeForce(currentPositions):
    potentialEnergy = 0
    force = zeros((NUMBER_PARTICLES,3))
    for iParticle in range(0,NUMBER_PARTICLES-1):
        for jParticle in range(iParticle + 1, NUMBER_PARTICLES):
            distance = currentPositions[iParticle] - currentPositions[jParticle]
            distance = distance - BOX_LENGTH * (distance/BOX_LENGTH).round()
            #note: this is so much faster than scipy.dot()
            distanceSquared = distance[0]*distance[0] + distance[1]*distance[1] + distance[2]*distance[2]            
            if distanceSquared < CUT_OFF_RADIUS_SQUARED:
                r2i = 1. / distanceSquared
                r6i = r2i*r2i*r2i
                lennardJones = 48. * r2i * r6i * (r6i - 0.5)
                force[iParticle] += lennardJones*distance
                force[jParticle] -= lennardJones*distance
                potentialEnergy += 4.* r6i * (r6i - 1.) - CUT_OFF_ENERGY
return(force,potentialEnergy)

the variables in CAPITAL letters are constant, defined in a config.py file. "currentPositions" is a 3 by number-of-particles matrix.

I have already implemented a version of the nested for loop with scipy.weave, which was inspired from this website: http://www.scipy.org/PerformancePython.

However, I don't like the loss of flexibility. I'm interested in "vectorizing" this for loop. I just don't really get how that works. Can anybody give me a clue or a good tutorial that teaches this?

sega_sai · Accepted Answer

Below is my vectorized version of your code. For a dataset with 1000 points my code is roughly 50 times faster then the original:

In [89]: xyz = 30 * np.random.uniform(size=(1000, 3))

In [90]: %timeit a0, b0 = computeForce(xyz)
1 loops, best of 3: 7.61 s per loop

In [91]: %timeit a, b = computeForceVector(xyz)
10 loops, best of 3: 139 ms per loop

The code:

from numpy import zeros

NUMBER_PARTICLES = 1000
BOX_LENGTH = 100
CUT_OFF_ENERGY = 1
CUT_OFF_RADIUS_SQUARED = 100

def computeForceVector(currentPositions):
    potentialEnergy = 0
    force = zeros((NUMBER_PARTICLES, 3))
    for iParticle in range(0, NUMBER_PARTICLES - 1):
        positionsJ =  currentPositions[iParticle + 1:, :]
        distance = currentPositions[iParticle, :] - positionsJ
        distance = distance - BOX_LENGTH * (distance / BOX_LENGTH).round()
        distanceSquared = (distance**2).sum(axis=1)
        ind = distanceSquared < CUT_OFF_RADIUS_SQUARED

        if ind.any():
            r2i = 1. / distanceSquared[ind]
            r6i = r2i * r2i * r2i
            lennardJones = 48. * r2i * r6i * (r6i - 0.5)
            ljdist = lennardJones[:, None] * distance[ind, :]
            force[iParticle, :] += (ljdist).sum(axis=0)
            force[iParticle+1:, :][ind, :] -= ljdist
            potentialEnergy += (4.* r6i * (r6i - 1.) - CUT_OFF_ENERGY).sum()
    return (force, potentialEnergy)

I've also checked that the code produces the same results

JoshAdel · Answer

Writing something like a MD engine in pure python is going to be slow. I would take a look at either Numba (http://numba.pydata.org/) or Cython (http://cython.org/). On the Cython side, I've written a simple Langevin Dynamics engine using cython that might serve as an example to get you started:

https://bitbucket.org/joshua.adelman/pylangevin-integrator

Another option, which I like quite a bit, is to use OpenMM. There is a python wrapper that allows you to put together all of the pieces of an MD engine, implement custom forces, etc. It also has the ability to target GPU devices:

https://simtk.org/home/openmm

In general though, there are so many highly tuned MD codes available, that unless you are doing this for some sort of general educational purpose, it doesn't make sense to write your own from scratch. Some of the major codes, just to name a few:

NAMD - http://www.ks.uiuc.edu/Research/namd/ (free)
Gromacs - http://www.gromacs.org/ (free)
Amber - http://ambermd.org/
LAMMPS - http://lammps.sandia.gov/ (free)
HOOMD - http://codeblue.umich.edu/hoomd-blue/ (free)
OpenMM - https://simtk.org/home/openmm (free)
Charmm - http://www.charmm.org/

seb · Answer

just to make this post complete, I add my implementation in weaved in C code. note that you need to import weave and converters for this to work. moreover, weave only works with python 2.7 right now. thanks again for all the help! this runs another 10 times faster than the vectorized version.

from scipy import weave
from scipy.weave import converters
def computeForceC(currentPositions):        
    code = """
    using namespace blitz;
    Array<double,1> distance(3);
    double distanceSquared, r2i, r6i, lennardJones;
    double potentialEnergy = 0.;

    for( int iParticle = 0; iParticle < (NUMBER_PARTICLES - 1); iParticle++){
        for( int jParticle = iParticle + 1; jParticle < NUMBER_PARTICLES; jParticle++){
            distance(0) = currentPositions(iParticle,0)-currentPositions(jParticle,0);
            distance(0) = distance(0) - BOX_LENGTH * round(distance(0)/BOX_LENGTH);
            distance(1) = currentPositions(iParticle,1)-currentPositions(jParticle,1);
            distance(1) = distance(1) - BOX_LENGTH * round(distance(1)/BOX_LENGTH);
            distance(2) = currentPositions(iParticle,2)-currentPositions(jParticle,2);
            distance(2) = distance(2) - BOX_LENGTH * round(distance(2)/BOX_LENGTH);
            distanceSquared = distance(0)*distance(0) + distance(1)*distance(1) + distance(2)*distance(2);
            if(distanceSquared < CUT_OFF_RADIUS_SQUARED){
                r2i = 1./distanceSquared;
                r6i = r2i * r2i * r2i;
                lennardJones = 48. * r2i * r6i * (r6i - 0.5);
                force(iParticle,0) += lennardJones*distance(0);
                force(iParticle,1) += lennardJones*distance(1);
                force(iParticle,2) += lennardJones*distance(2);
                force(jParticle,0) -= lennardJones*distance(0);
                force(jParticle,1) -= lennardJones*distance(1);
                force(jParticle,2) -= lennardJones*distance(2);
                potentialEnergy += 4.* r6i * (r6i - 1.)-CUT_OFF_ENERGY;

                }

            }//end inner for loop
    }//end outer for loop
    return_val = potentialEnergy;

    """
    #args that are passed into weave.inline and created inside computeForce
    #potentialEnergy = 0.
    force = zeros((NUMBER_PARTICLES,3))

    #all args
    arguments = ['currentPositions','force','NUMBER_PARTICLES','CUT_OFF_RADIUS_SQUARED','BOX_LENGTH','CUT_OFF_ENERGY']
    #evaluate stuff in code
    potentialEnergy = weave.inline(code,arguments,type_converters = converters.blitz,compiler = 'gcc')    

    return force, potentialEnergy

improve nested loop performance

Tags:

python

loops

nested

numpy

seb

3 Answers

sega_sai

JoshAdel

seb

Recent Activity

Donate For Us

improve nested loop performance

Tags:

python

loops

nested

numpy

seb

3 Answers

sega_sai

JoshAdel

seb

Related questions

Recent Activity

Donate For Us