I'm relatively new to Python and I've got a nested for loop. Since the for loops take a while to run, I'm trying to figure out a way to vectorize this code so it can run faster.
In this case, coord is a 3-dimensional array where coord[x, 0, 0] and coord[x, 0, 1] are integers and coord[x, 0, 2] is either 0 or 1. H is a SciPy sparse matrix and x_dist, y_dist, z_dist, and a are all floats.
# x_dist, y_dist, and z_dist are floats
# coord is a num x 1 x 3 numpy array where num can go into the hundreds of thousands
num = coord.shape[0]
H = sparse.lil_matrix((num, num))
for i in xrange(num):
for j in xrange(num):
if (np.absolute(coord[i, 0, 0] - coord[j, 0, 0]) <= 2 and
(np.absolute(coord[i, 0, 1] - coord[j, 0, 1]) <= 1)):
x = ((coord[i, 0, 0] * x_dist + coord[i, 0, 2] * z_dist) -
(coord[j, 0, 0] * x_dist + coord[j, 0, 2] * z_dist))
y = (coord[i, 0, 1] * y_dist) - (coord[j, 0, 1] * y_dist)
if a - 0.5 <= np.sqrt(x ** 2 + y ** 2) <= a + 0.5:
H[i, j] = -2.7
I've also read that broadcasting with NumPy, while much faster, can lead to large amounts of memory usage from temporary arrays. Would it be better to go the vectorization route or try and use something like Cython?
This is how I would vectorize your code, some discussion on the caveats later:
import numpy as np
import scipy.sparse as sps
idx = ((np.abs(coord[:, 0, 0] - coord[:, 0, 0, None]) <= 2) &
(np.abs(coord[:, 0, 1] - coord[:, 0, 1, None]) <= 1))
rows, cols = np.nonzero(idx)
x = ((coord[rows, 0, 0]-coord[cols, 0, 0]) * x_dist +
(coord[rows, 0, 2]-coord[cols, 0, 2]) * z_dist)
y = (coord[rows, 0, 1]-coord[cols, 0, 1]) * y_dist
r2 = x*x + y*y
idx = ((a - 0.5)**2 <= r2) & (r2 <= (a + 0.5)**2)
rows, cols = rows[idx], cols[idx]
data = np.repeat(2.7, len(rows))
H = sps.coo_matrix((data, (rows, cols)), shape=(num, num)).tolil()
As you noted, the issues are going to come with the first idx
array, as it will be of shape (num, num)
, so it will probably blow your memory to pieces if num
is "into the hundreds of thousands."
One potential solution is to break down your problem into manageable chunks. If you have a 100,000 element array, you can split it into 100 chunks of 1,000 elements, and run a modified version of the code above for each of the 10,000 combinations of chunks. You would only need a 1,000,000 element idx
array (which you could pre-allocate and reuse for better performance), and you would have a loop of only 10,000 iterations, instead of the 10,000,000,000 of your current implementation. It is sort of a poor man's parallelization scheme, which you can actually improve on by having several of those chunks processed in parallel if you have a multi-core machine.
The nature of the calculation makes it tough to vectorize with numpy methods I'm familiar with. I think the best solution in terms of speed and memory usage would be cython. However, you can get some speed-up using numba. Here is an example (note that normally you use autojit
as a decorator):
import numpy as np
from scipy import sparse
import time
from numba.decorators import autojit
x_dist=.5
y_dist = .5
z_dist = .4
a = .6
coord = np.random.normal(size=(1000,1000,1000))
def run(coord, x_dist,y_dist, z_dist, a):
num = coord.shape[0]
H = sparse.lil_matrix((num, num))
for i in xrange(num):
for j in xrange(num):
if (np.absolute(coord[i, 0, 0] - coord[j, 0, 0]) <= 2 and
(np.absolute(coord[i, 0, 1] - coord[j, 0, 1]) <= 1)):
x = ((coord[i, 0, 0] * x_dist + coord[i, 0, 2] * z_dist) -
(coord[j, 0, 0] * x_dist + coord[j, 0, 2] * z_dist))
y = (coord[i, 0, 1] * y_dist) - (coord[j, 0, 1] * y_dist)
if a - 0.5 <= np.sqrt(x ** 2 + y ** 2) <= a + 0.5:
H[i, j] = -2.7
return H
runaj = autojit(run)
t0 = time.time()
run(coord,x_dist,y_dist, z_dist, a)
t1 = time.time()
print 'First Original Runtime:', t1 - t0
t0 = time.time()
run(coord,x_dist,y_dist, z_dist, a)
t1 = time.time()
print 'Second Original Runtime:', t1 - t0
t0 = time.time()
run(coord,x_dist,y_dist, z_dist, a)
t1 = time.time()
print 'Third Original Runtime:', t1 - t0
t0 = time.time()
runaj(coord,x_dist,y_dist, z_dist, a)
t1 = time.time()
print 'First Numba Runtime:', t1 - t0
t0 = time.time()
runaj(coord,x_dist,y_dist, z_dist, a)
t1 = time.time()
print 'Second Numba Runtime:', t1 - t0
t0 = time.time()
runaj(coord,x_dist,y_dist, z_dist, a)
t1 = time.time()
print 'Third Numba Runtime:', t1 - t0
I get this output:
First Original Runtime: 21.3574919701
Second Original Runtime: 15.7615520954
Third Original Runtime: 15.3634860516
First Numba Runtime: 9.87108802795
Second Numba Runtime: 9.32944011688
Third Numba Runtime: 9.32300305367
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With