I have been working on this function that generates some parameters I need for a simulation code I am developing and have hit a wall with enhancing its performance.
Profiling the code shows that this is the main bottleneck so any enhancements I can make to it however minor would be great.
I wanted to try to vectorize parts of this function but I am not sure if it is possible.
The main challenge is that the parameters that get stored in my array params
depends upon the indices of params. The only straightforward solution to this I saw was using np.ndenumerate
, but this seems to be pretty slow.
Is it possible to vectorize this type of operation where the values stored in the array depend upon where they are being stored? Or would it be smarter/faster to create a generator that would just give me the tuples of the array indices?
import numpy as np
from scipy.sparse import linalg as LA
def get_params(num_bonds, energies):
"""
Returns the interaction parameters of different pairs of atoms.
Parameters
----------
num_bonds : ndarray, shape = (M, 20)
Sparse array containing the number of nearest neighbor bonds for
different pairs of atoms (denoted by their column) and next-
nearest neighbor bonds. Columns 0-9 contain nearest neighbors,
10-19 contain next-nearest neighbors
energies : ndarray, shape = (M, )
Energy vector corresponding to each atomic system stored in each
row of num_bonds.
"""
# -- Compute the bond energies
x = LA.lsqr(num_bonds, energies, show=False)[0]
params = np.zeros([4, 4, 4, 4, 4, 4, 4, 4, 4])
nn = {(0,0): x[0], (1,1): x[1], (2,2): x[2], (3,3): x[3], (0,1): x[4],
(1,0): x[4], (0,2): x[5], (2,0): x[5], (0,3): x[6], (3,0): x[6],
(1,2): x[7], (2,1): x[7], (1,3): x[8], (3,1): x[8], (2,3): x[9],
(3,2): x[9]}
nnn = {(0,0): x[10], (1,1): x[11], (2,2): x[12], (3,3): x[13], (0,1): x[14],
(1,0): x[14], (0,2): x[15], (2,0): x[15], (0,3): x[16], (3,0): x[16],
(1,2): x[17], (2,1): x[17], (1,3): x[18], (3,1): x[18], (2,3): x[19],
(3,2): x[19]}
"""
params contains the energy contribution of each site due to its
local environment. The shape is given by the number of possible atom
types and the number of sites in the lattice.
"""
for (i,j,k,l,m,jj,kk,ll,mm), val in np.ndenumerate(params):
params[i,j,k,l,m,jj,kk,ll,mm] = nn[(i,j)] + nn[(i,k)] + nn[(i,l)] + \
nn[(i,m)] + nnn[(i,jj)] + \
nnn[(i,kk)] + nnn[(i,ll)] + nnn[(i,mm)]
return np.ascontiguousarray(params)
Here's a vectorized approach using broadcasted
summations -
# Gather the elements sorted by the keys in (row,col) order of a dense
# 2D array for both nn and nnn
sidx0 = np.ravel_multi_index(np.array(nn.keys()).T,(4,4)).argsort()
a0 = np.array(nn.values())[sidx0].reshape(4,4)
sidx1 = np.ravel_multi_index(np.array(nnn.keys()).T,(4,4)).argsort()
a1 = np.array(nnn.values())[sidx1].reshape(4,4)
# Perform the summations keep the first axis aligned for nn and nnn parts
parte0 = a0[:,:,None,None,None] + a0[:,None,:,None,None] + \
a0[:,None,None,:,None] + a0[:,None,None,None,:]
parte1 = a1[:,:,None,None,None] + a1[:,None,:,None,None] + \
a1[:,None,None,:,None] + a1[:,None,None,None,:]
# Finally add up sums from nn and nnn for final output
out = parte0[...,None,None,None,None] + parte1[:,None,None,None,None]
Runtime test
Function defintions -
def vectorized_approach(nn,nnn):
sidx0 = np.ravel_multi_index(np.array(nn.keys()).T,(4,4)).argsort()
a0 = np.array(nn.values())[sidx0].reshape(4,4)
sidx1 = np.ravel_multi_index(np.array(nnn.keys()).T,(4,4)).argsort()
a1 = np.array(nnn.values())[sidx1].reshape(4,4)
parte0 = a0[:,:,None,None,None] + a0[:,None,:,None,None] + \
a0[:,None,None,:,None] + a0[:,None,None,None,:]
parte1 = a1[:,:,None,None,None] + a1[:,None,:,None,None] + \
a1[:,None,None,:,None] + a1[:,None,None,None,:]
return parte0[...,None,None,None,None] + parte1[:,None,None,None,None]
def original_approach(nn,nnn):
params = np.zeros([4, 4, 4, 4, 4, 4, 4, 4, 4])
for (i,j,k,l,m,jj,kk,ll,mm), val in np.ndenumerate(params):
params[i,j,k,l,m,jj,kk,ll,mm] = nn[(i,j)] + nn[(i,k)] + nn[(i,l)] + \
nn[(i,m)] + nnn[(i,jj)] + \
nnn[(i,kk)] + nnn[(i,ll)] + nnn[(i,mm)]
return params
Setup inputs -
# Setup inputs
x = np.random.rand(30)
nn = {(0,0): x[0], (1,1): x[1], (2,2): x[2], (3,3): x[3], (0,1): x[4],
(1,0): x[4], (0,2): x[5], (2,0): x[5], (0,3): x[6], (3,0): x[6],
(1,2): x[7], (2,1): x[7], (1,3): x[8], (3,1): x[8], (2,3): x[9],
(3,2): x[9]}
nnn = {(0,0): x[10], (1,1): x[11], (2,2): x[12], (3,3): x[13], (0,1): x[14],
(1,0): x[14], (0,2): x[15], (2,0): x[15], (0,3): x[16], (3,0): x[16],
(1,2): x[17], (2,1): x[17], (1,3): x[18], (3,1): x[18], (2,3): x[19],
(3,2): x[19]}
Timings -
In [98]: np.allclose(original_approach(nn,nnn),vectorized_approach(nn,nnn))
Out[98]: True
In [99]: %timeit original_approach(nn,nnn)
1 loops, best of 3: 884 ms per loop
In [100]: %timeit vectorized_approach(nn,nnn)
1000 loops, best of 3: 708 µs per loop
Welcome to 1000x+
speedup!
For a system of generic number of such outer products, here's a generic solution that iterates through those dimensions -
m,n = a0.shape # size of output array along each axis
N = 4 # Order of system
out = a0.copy()
for i in range(1,N):
out = out[...,None] + a0.reshape((m,)+(1,)*i+(n,))
for i in range(N):
out = out[...,None] + a1.reshape((m,)+(1,)*(i+n)+(n,))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With