I am looking to vectorize a nested loop, which will work on a list of 300,000 lists, with each of these lists containing 3 values. The nested loop compares the values of each of the lists with the corresponding values in the other lists, and will only append the list indices which have corresponding values having a maximum difference of 0.1 between them. Thus, a list containing [0.234, 0.456, 0.567] and a list containing [0.246, 0.479, 0.580] would fall in this category, since their corresponding values (i.e. 0.234 and 0.246; 0.456 and 0.479; 0.567 and 0.580) have a difference of less than 0.1 between them.
I currently use the following nested loop to do this, but it would currently take approximately 58 hours to complete (a total of 90 trillion iterations);
import numpy as np
variable = np.random.random((300000,3)).tolist()
out1=list()
out2=list()
for i in range(0:300000):
for j in range(0:300000):
if ((i<j) and ((abs(variable[i][0]-variable[j][0]))<0.1) and ((abs(variable[i][1]-variable[j] [1]))<0.1) and ((abs(variable[i][2]-variable[j][2]))<0.1)):
out1.append(i)
out2.append(j)
Look into scipy.spatial; it has a lot of functionality for solving such spatial queries efficiently; KDTrees in particular, ie:
import scipy.spatial
out = scipy.spatial.cKDTree(variable).query_pairs(r=0.1, p=np.infinity)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With