I am quite new to Python. I have the following table in Postgres. These are Polygon values with four coordinates with same Id
with ZONE
name I have stored this data in Python dataframe called df1
Id Order Lat Lon Zone
00001 1 50.6373473 3.075029928 A
00001 2 50.63740441 3.075068636 A
00001 3 50.63744285 3.074951754 A
00001 4 50.63737839 3.074913884 A
00002 1 50.6376054 3.0750528 B
00002 2 50.6375896 3.0751209 B
00002 3 50.6374239 3.0750246 B
00002 4 50.6374404 3.0749554 B
I have Json data with Lon
and Lat
values and I have stored them is python dataframe called df2
.
Lat Lon
50.6375524099 3.07507914474
50.6375714407 3.07508201591
My task is to compare df2
Lat
and Lon
values with four coordinates of each zone in df1
to extract the zone name and add it to df2
.
For instance (50.637552409 3.07507914474)
belongs to Zone B
.
#This is ID with Zone
df1 = pd.read_sql_query("""SELECT * from "zmap" """,con=engine)
#This is with lat,lon values
df2 = pd.read_sql_query("""SELECT * from "E1" """,con=engine)
df2['latlon'] = zip(df2.lat, df2.lon)
zones = [
["A", [[50.637347297, 3.075029928], [50.637404408, 3.075068636], [50.637442847, 3.074951754],[50.637378390, 3.074913884]]]]
for i in range(0, len(zones)): # for each zone points
X = mplPath.Path(np.array(zones[i][1]))
# find if points are Zones
Y= X.contains_points(df2.latlon.values.tolist())
# Label points that are in the current zone
df2[Y, 'zone'] = zones[i][0]
Currently I have done it manually for Zone 'A'. I need to generate the "Zones" for the coordinates in df2.
at is a single element and using . loc maybe a Series or a DataFrame. Returning single value is not the case always. It returns array of values if the provided index is used multiple times.
This sounds like a good use case for scipy cdist, also discussed here.
import pandas as pd
from scipy.spatial.distance import cdist
data1 = {'Lat': pd.Series([50.6373473,50.63740441,50.63744285,50.63737839,50.6376054,50.6375896,50.6374239,50.6374404]),
'Lon': pd.Series([3.075029928,3.075068636,3.074951754,3.074913884,3.0750528,3.0751209,3.0750246,3.0749554]),
'Zone': pd.Series(['A','A','A','A','B','B','B','B'])}
data2 = {'Lat': pd.Series([50.6375524099,50.6375714407]),
'Lon': pd.Series([3.07507914474,3.07508201591])}
def closest_point(point, points):
""" Find closest point from a list of points. """
return points[cdist([point], points).argmin()]
def match_value(df, col1, x, col2):
""" Match value x from col1 row to value in col2. """
return df[df[col1] == x][col2].values[0]
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df1['point'] = [(x, y) for x,y in zip(df1['Lat'], df1['Lon'])]
df2['point'] = [(x, y) for x,y in zip(df2['Lat'], df2['Lon'])]
df2['closest'] = [closest_point(x, list(df1['point'])) for x in df2['point']]
df2['zone'] = [match_value(df1, 'point', x, 'Zone') for x in df2['closest']]
print(df2)
# Lat Lon point closest zone
# 0 50.637552 3.075079 (50.6375524099, 3.07507914474) (50.6375896, 3.0751209) B
# 1 50.637571 3.075082 (50.6375714407, 3.07508201591) (50.6375896, 3.0751209) B
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With