Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorizing a Nested Loop

I am looking to vectorize a nested loop, which will work on a list of 300,000 lists, with each of these lists containing 3 values. The nested loop compares the values of each of the lists with the corresponding values in the other lists, and will only append the list indices which have corresponding values having a maximum difference of 0.1 between them. Thus, a list containing [0.234, 0.456, 0.567] and a list containing [0.246, 0.479, 0.580] would fall in this category, since their corresponding values (i.e. 0.234 and 0.246; 0.456 and 0.479; 0.567 and 0.580) have a difference of less than 0.1 between them.

I currently use the following nested loop to do this, but it would currently take approximately 58 hours to complete (a total of 90 trillion iterations);

import numpy as np
variable = np.random.random((300000,3)).tolist()
out1=list()
out2=list()
for i in range(0:300000):
    for j in range(0:300000):
        if ((i<j) and ((abs(variable[i][0]-variable[j][0]))<0.1) and ((abs(variable[i][1]-variable[j] [1]))<0.1) and ((abs(variable[i][2]-variable[j][2]))<0.1)):
        out1.append(i)  
        out2.append(j)
like image 979
JBorg Avatar asked Aug 03 '16 14:08

JBorg


1 Answers

Look into scipy.spatial; it has a lot of functionality for solving such spatial queries efficiently; KDTrees in particular, ie:

import scipy.spatial
out = scipy.spatial.cKDTree(variable).query_pairs(r=0.1, p=np.infinity)
like image 82
Eelco Hoogendoorn Avatar answered Oct 01 '22 22:10

Eelco Hoogendoorn