I am going through my coordinates data and I see some duplicate coordinates with different parameters due to certain preprocessing. I want to be able to merge the attributes corresponding to the matched coordinates and get the simplified results. To clarify what I mean here is an example:
X = [1.0, 2.0, 3.0, 2.0]
Y = [8.0, 3.0, 4.0, 3.0]
A = [13, 16, 20, 8]
The above data is read as follows: point (1.0, 8.0) has a value of 13 and (2.0, 3.0) has a value of 16. Notice that the second point and fourth point have the same coordinates but different attribute values. I want to be able to remove the duplicates from the lists of coordinates and sum the attributes so the results would be new lists:
New_X = [1.0, 2.0, 3.0]
New_Y = [8.0, 3.0, 4.0]
New_A = [13, 24, 20]
24 is the sum of 16 and 8 from the second and fourth points with the same coordinates, therefore one point is kept and the values are summed.
I am not sure how to do this, I thought of using nested for loops of zips of the coordinates but I am not sure how to formulate it to sum the attributes.
Any help is appreciated!
I think that maintaining 3 lists is a bit awkward. Something like:
D = dict()
for x,y,a in zip(X,Y,A):
D[(x,y)] = D.get((x,y),0) + a
would put everything together in one place.
If you'd prefer to decompose it back into 3 lists:
for (x,y),a in D.items():
newX.append(x)
newY.append(y)
newA.append(a)
Another option here is to use itertools.groupby
. But since this only groups consecutive keys, you'll have to first sort your coordinates.
First we can zip
them together to create tuples of the form (x, y, a)
. Then sort these by the (x, y)
coordinates:
sc = sorted(zip(X, Y, A), key=lambda P: (P[0], P[1])) # sorted coordinates
print(sc)
#[(1.0, 8.0, 13), (2.0, 3.0, 16), (2.0, 3.0, 8), (3.0, 4.0, 20)]
Now we can groupby
the coordinates and sum the values:
from itertools import groupby
print([(*a, sum(c[2] for c in b)) for a, b in groupby(sc, key=lambda P: (P[0], P[1]))])
#[(1.0, 8.0, 13), (2.0, 3.0, 24), (3.0, 4.0, 20)]
And since zip
is its own inverse, you can get New_X
, New_Y
, and New_A
via:
New_X, New_Y, New_A = zip(
*((*a, sum(c[2] for c in b)) for a, b in groupby(sc, key=lambda P: (P[0], P[1])))
)
print(New_X)
print(New_Y)
print(New_A)
#(1.0, 2.0, 3.0)
#(8.0, 3.0, 4.0)
#(13, 24, 20)
Of course, you can do this all in one line but I broke it up into pieces so that it's easier to understand.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With