I have a df of coordinates representing points at various timescales. I want to calculate the average these points in relation to each other.
To achieve this, I'm aiming to calculate the space between each point and the rest of the points. I'm then hoping to average these points.
The following calculates the distance between each pair of points.
import pandas as pd
from scipy.spatial import distance
import itertools
df = pd.DataFrame({
'Time' : [1,1,1,2,2,2,3,3,3],
'id' : ['A','B','C','A','B','C','A','B','C'],
'X' : [1.0,3.0,2.0,2.0,4.0,3.0,3.0,5.0,4.0],
'Y' : [1.0,1.0,0.5,2.0,2.0,2.5,3.0,3.0,3.0],
})
ids = list(df['id'])
# get the points
points = df[["X", "Y"]].values
# calculate distance of each point from every other point.
# row i contains contains distances for point i.
# distances[i, j] contains distance of point i from point j.
distances = distance.cdist(points, points, "euclidean")
distances = distances.flatten()
# get the start and end points
cartesian = list(itertools.product(ids, ids))
data = dict(
start_region = [x[0] for x in cartesian],
end_region = [x[1] for x in cartesian],
distance = distances
)
df1 = pd.DataFrame(data)
All I really need to output is:
Time start_point end_point X Y
0 1 A B 2.0 0.0
1 1 A C 1.0 -0.5
2 1 B C -1.0 -0.5
3 2 A B 2.0 0.0
4 2 A C 1.0 0.5
5 2 B C -1.0 0.5
6 3 A B 2.0 0.0
7 3 A C 1.0 0.0
8 3 B C -1.0 0.0

So the average position of these points in relation to each other would be the green coordinates.
But if I average the dataset above it displays:

I understand how this occurs. It's not referencing the other points.
Here my take on it
import itertools
def relative_dist(gp):
combs = list(itertools.combinations(gp.index, 2))
df_gp = pd.concat([gp.loc[tup,:].diff() for tup in combs], keys=combs).dropna()
return df_gp
df_dist = (df.set_index('id').groupby('Time')[['X','Y']].apply(relative_dist)
.droplevel('id').rename_axis(['Time','start_point','end_point'])
.reset_index())
Out[341]:
Time start_point end_point X Y
0 1 A B 2.0 0.0
1 1 A C 1.0 -0.5
2 1 B C -1.0 -0.5
3 2 A B 2.0 0.0
4 2 A C 1.0 0.5
5 2 B C -1.0 0.5
6 3 A B 2.0 0.0
7 3 A C 1.0 0.0
8 3 B C -1.0 0.0
df_avg = df_dist.groupby(['start_point','end_point'], as_index=False)[['X','Y']].mean()
Out[347]:
start_point end_point X Y
0 A B 2.0 0.0
1 A C 1.0 0.0
2 B C -1.0 0.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With