Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient manipulation of a list of cartesian coordinates in Python

Background:

I'm writing a program which handles large quantities of data related to the networks of vertices of various regular shapes. I have a working generator which produces a list of cartesian coordinates corresponding to the vertices of said shapes based on a range of user input parameters. The data is then passed to filters which clear up duplicate entries, sort the data and various other functions, from where the cleaned data is fed to a canvas module which loops through and draws the vertices.

Question:

I need to implement a new filter that efficiently loops through the coordinates, comparing each pair against every other pair, i.e. (x1,y1)->(x2,y2) to (x1,y1)->(xn,yn), (x2,y2)->(x3,y3) to (x2,y2)->(xn,yn) etc. for all entries and, for example, if the relationship between (x1,y1) and (x5,y5) fits [(x5-x1)^2+(y5-y1)^2]=vertex_spacing^2, then the two sets of coordinates are then paired with their respective list entry numbers and appended to a new list where one entry would be of the form: [(x1,y1), (x5,y5), 0, 4] for example. What is the most efficient method for achieving this?

My Attempts:

I've looked at quite a few methods for handling lists on here and on various guides. I've attempted nested 'for' and 'if' loops, but find while this method can work it leads to excessively long run times, as well as attempting to breaking the problem down into numerous smaller for loops.

Further Notes:

The ultimate aim of this is to use the resulting coordinates for front-end interface elements, and to be saved and imported as necessary. The function of the list positions 0 and 4 in [(x1,y1), (x5,y5), 0, 4] is to enable the interface to group coordinates for later use in canvas objects. The method should be able to process potentially thousands of coordinates.

Thank you in advance for any help, I am of course willing to improve the phrasing/information I've supplied and/or add example code if it is unclear what I am asking in any way- I'm still quite new to this! :)

like image 835
MarkyD43 Avatar asked Aug 17 '13 18:08

MarkyD43


1 Answers

What you're basically checking is:

for each vertex v, find all vertices u such that u is on the circle of radius vertex_spacing around v.

If the distribution of your points is such that not all points are close together, I think of two approaches to speed up the search:

  1. The simplest way to speed up this process would be to sort the points by x-coordinate. That way, you can possible skip many comparisons. As a simple example, assume that the x-coordinates are [1, 2, 10, 15, 18, 20, 21] and vertex_spacing = 5. You only need to compare the first vertex with the second one, because all the other vertices are obviously outside the circle around the first vertex.

    Note that this approach is useless if all points are close together. In other words, if vertex_spacing = 25, you cannot skip any comparison.

  2. Along the same lines, you could use a 2-dimensional k-d tree. This is equivalent to the sorting approach, but in two dimensions. Given a vertex (x, y) and vertex_spacing = v, you would have to check all points in the range ([x-v, x+v], [y-v, y+v]). Using the same example as before, assume the first point had coordinates (1, 0) and the second one (2, 10), there would be no need to compare the first vertex to anything.

Both approaches are heuristics and do not give any improvements on the worst-case running time (quite contrary: you also have the overhead of the sorting / building the k-d tree), but if the vertices are typically at least vertex_space apart, this could speed up your search a lot.

like image 167
Vincent van der Weele Avatar answered Oct 12 '22 06:10

Vincent van der Weele