Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum attributes of duplicate coordinates in python

I am going through my coordinates data and I see some duplicate coordinates with different parameters due to certain preprocessing. I want to be able to merge the attributes corresponding to the matched coordinates and get the simplified results. To clarify what I mean here is an example:

X = [1.0, 2.0, 3.0, 2.0]
Y = [8.0, 3.0, 4.0, 3.0]
A = [13, 16, 20, 8]

The above data is read as follows: point (1.0, 8.0) has a value of 13 and (2.0, 3.0) has a value of 16. Notice that the second point and fourth point have the same coordinates but different attribute values. I want to be able to remove the duplicates from the lists of coordinates and sum the attributes so the results would be new lists:

New_X = [1.0, 2.0, 3.0]
New_Y = [8.0, 3.0, 4.0]
New_A = [13, 24, 20]

24 is the sum of 16 and 8 from the second and fourth points with the same coordinates, therefore one point is kept and the values are summed.

I am not sure how to do this, I thought of using nested for loops of zips of the coordinates but I am not sure how to formulate it to sum the attributes.

Any help is appreciated!

like image 281
mb567 Avatar asked Jun 27 '18 19:06

mb567


2 Answers

I think that maintaining 3 lists is a bit awkward. Something like:

D = dict()
for x,y,a in zip(X,Y,A):
    D[(x,y)] = D.get((x,y),0) + a

would put everything together in one place.

If you'd prefer to decompose it back into 3 lists:

for (x,y),a in D.items():
    newX.append(x)
    newY.append(y)
    newA.append(a)
like image 196
dashiell Avatar answered Sep 27 '22 21:09

dashiell


Another option here is to use itertools.groupby. But since this only groups consecutive keys, you'll have to first sort your coordinates.

First we can zip them together to create tuples of the form (x, y, a). Then sort these by the (x, y) coordinates:

sc = sorted(zip(X, Y, A), key=lambda P: (P[0], P[1]))  # sorted coordinates
print(sc)
#[(1.0, 8.0, 13), (2.0, 3.0, 16), (2.0, 3.0, 8), (3.0, 4.0, 20)]

Now we can groupby the coordinates and sum the values:

from itertools import groupby
print([(*a, sum(c[2] for c in b)) for a, b in groupby(sc, key=lambda P: (P[0], P[1]))])
#[(1.0, 8.0, 13), (2.0, 3.0, 24), (3.0, 4.0, 20)]

And since zip is its own inverse, you can get New_X, New_Y, and New_A via:

New_X, New_Y, New_A = zip(
    *((*a, sum(c[2] for c in b)) for a, b in groupby(sc, key=lambda P: (P[0], P[1])))
)
print(New_X)
print(New_Y)
print(New_A)
#(1.0, 2.0, 3.0)
#(8.0, 3.0, 4.0)
#(13, 24, 20)

Of course, you can do this all in one line but I broke it up into pieces so that it's easier to understand.

like image 24
pault Avatar answered Sep 27 '22 22:09

pault