Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to calculate the centroid of a set of coordinate tuples in python without numpy

I've been working on a project that is incredibly time sensitive (that unfortunately has to be in python) and one of the functions that is used extensively is a function that calculates the centroid of a list of (x, y) tuples. To illustrate:

def centroid(*points):     x_coords = [p[0] for p in points]     y_coords = [p[1] for p in points]     _len = len(points)     centroid_x = sum(x_coords)/_len     centroid_y = sum(y_coords)/_len     return [centroid_x, centroid_y] 

where

>>> centroid((0, 0), (10, 0), (10, 10), (0, 10)) [5, 5] 

This function runs fairly quickly, the above example completing in an average of 1.49e-05 seconds on my system but I'm looking for the fastest way to calculate the centroid. Do you have any ideas?

One of the other solutions I had was to do the following (where l is the list of tuples):

map(len(l).__rtruediv__, map(sum, zip(*l))) 

Which runs in between 1.01e-05 and 9.6e-06 seconds, but unfortunately converting to a list (by surrounding the whole statement in list( ... )) nearly doubles computation time.

EDIT: Suggestions are welcome in pure python BUT NOT numpy.

EDIT2: Just found out that if a separate variable is kept for the length of the list of tuples, then my above implementation with map runs reliably under 9.2e-06 seconds, but there's still the problem of converting back to a list.

EDIT3:

Now I'm only accepting answers in pure python, NOT in numpy (sorry to those that already answered in numpy!)

like image 789
user3002473 Avatar asked Apr 11 '14 19:04

user3002473


People also ask

How do you find the centroid of a list in Python?

linalg. norm(dataSetRow - centroids[0, :-1]) for i in range(1, centroids. shape[0]): dist = np. linalg.

How do you find the centroid of a set of data?

To calculate the centroid from the cluster table just get the position of all points of a single cluster, sum them up and divide by the number of points.


1 Answers

import numpy as np  data = np.random.randint(0, 10, size=(100000, 2)) 

this here is fast

def centeroidnp(arr):     length = arr.shape[0]     sum_x = np.sum(arr[:, 0])     sum_y = np.sum(arr[:, 1])     return sum_x/length, sum_y/length  %timeit centeroidnp(data) 10000 loops, best of 3: 181 µs per loop 

surprisingly, this is much slower:

%timeit data.mean(axis=0) 1000 loops, best of 3: 1.75 ms per loop 

numpy seems very quick to me...

For completeness:

def centeroidpython(data):     x, y = zip(*data)     l = len(x)     return sum(x) / l, sum(y) / l #take the data conversion out to be fair! data = list(tuple(i) for i in data)  %timeit centeroidpython(data) 10 loops, best of 3: 57 ms per loop 
like image 169
Retozi Avatar answered Sep 29 '22 20:09

Retozi