I have a list of coordinates that looks like this:
my_list = [[1, 1], [1, 3], [1, 5], [2, 1], [2, 3]]
As we see, there are same X values for first three coordinates with different Y and same situation for another two coordiantes. I want to make new list which will look like this:
new_list = [[1, 3], [2, 2]]
where y1 = 3 = (1+3+5)/3
and y2 = 2 = (1+3)/2
.
I have written my code which is below, but it works slowly.
I work with hundreds of thousands coordinates so the question is: How to make this code work faster? Is there any optimization or special open source libraty, which can speed up my code?
Thank you in advance.
x_mass = []
for m in mass:
x_mass.append(m[0])
set_x_mass = set(x_mass)
list_x_mass = list(set_x_mass)
performance_points = []
def function(i):
unique_x_mass = []
for m in mass:
if m[0] == i:
unique_x_mass.append(m)
summ_y = 0
for m in unique_x_mass:
summ_y += m[1]
point = [float(m[0]), float(summ_y/len(unique_x_mass))]
performance_points.append(point)
return performance_points
for x in list_x_mass:
function(x)
Create DataFrame
and aggregate mean
:
L = [[1, 1], [1, 3], [1, 5], [2, 1], [2, 3]]
L1 = pd.DataFrame(L).groupby(0, as_index=False)[1].mean().values.tolist()
print (L1)
[[1, 3], [2, 2]]
The pandas solution offered by @jezrael is elegant but slow (like almost everything pandas). I would suggest using modules itertools
and statistics
:
from statistics import mean
from itertools import groupby
grouper = groupby(L, key=lambda x: x[0])
#The next line is again more elegant, but slower:
#grouper = groupby(L, key=operator.itemgetter(0))
[[x, mean(yi[1] for yi in y)] for x,y in grouper]
The result is, of course, the same. The execution time for the sample list is two orders of magnitude faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With