Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing Python KD Tree Searches

Scipy (http://www.scipy.org/) offers two KD Tree classes; the KDTree and the cKDTree.

The cKDTree is much faster, but is less customizable and query-able than the KDTree (as far as I can tell from the docs).

Here is my problem: I have a list of 3million 2 dimensional (X,Y) points. I need to return all of the points within a distance of X units from every point.

With the KDtree, there is an option to do just this: KDtree.query_ball_tree() It generates a list of lists of all the points within X units from every other point. HOWEVER: this list is enormous and quickly fills up my virtual memory (about 744 million items long).

Potential solution #1: Is there a way to parse this list into a text file as it is writing?

Potential solution #2: I have tried using a for loop (for every point in the list) and then finding that single point's neighbors within X units by employing: KDtree.query_ball_point(). HOWEVER: this takes forever as it needs to run the query millions of times. Is there a cKDTree equivalent of this KDTree tool?

Potential solution #3: Beats me, anyone else have any ideas?

like image 438
Dlinet Avatar asked Oct 25 '12 23:10

Dlinet


1 Answers

From scipy 0.12 on, both KD Tree classes have feature parity. Quoting its announcement:

cKDTree feature-complete

Cython version of KDTree, cKDTree, is now feature-complete. Most operations (construction, query, query_ball_point, query_pairs, count_neighbors and sparse_distance_matrix) are between 200 and 1000 times faster in cKDTree than in KDTree. With very minor caveats, cKDTree has exactly the same interface as KDTree, and can be used as a drop-in replacement.

like image 134
jorgeca Avatar answered Sep 22 '22 22:09

jorgeca