Hi I'm trying to merge two datasets on the closest matching date_times.
I have two time stamps for open and closed events.
The merge_asof runs fine on the open date, but returns 'ValueError: left keys must be sorted' on the second date_time.
I sort by the relevant date_time on both occasions.
First dataframe:
idtbl_station_manager date_time_stamp fld_station_number \
0 1121 2017-09-19 15:41:24 AM00571
1 1122 2017-09-19 15:41:24 AM00572
2 1123 2017-09-19 15:41:24 AM00573
fld_grid_number fld_status fld_station_number_int \
0 VOY-024-001 CLOSED 571
1 VOY-024-002 CLOSED 572
2 VOY-024-003 CLOSED 573
fld_activities date_time_stamp_open fld_lat_open \
0 Drift Net,CTD-Overside,Dredge 2017-04-13 07:23:35
1 Drift Net,CTD-Overside,Dredge 2017-04-13 10:15:07 4649.028 S
2 Drift Net,CTD-Overside,Dredge 2017-04-13 13:15:42 4648.497 S
fld_lon_open date_time_stamp_close fld_lat_close fld_lon_close
0 03759.143 E 2017-04-13 09:51:18 4647.361 S 03759.142 E
1 03759.143 E 2017-04-13 12:11:00 4647.344 S 03759.143 E
2 2017-04-13 15:09:26 4647.344 S 03759.143 E
Second dataframe:
idtbl_gpgga date_time_stamp fld_utc fld_lat fld_lat_dir \
1179828 1179829 2017-04-04 02:00:04 000005.00 3354.138 S
0 1 2017-04-04 02:00:05 000006.00 3354.138 S
1 2 2017-04-04 02:00:07 000008.00 3354.138 S
fld_lon fld_lon_dir fld_gps_quality fld_nos fld_hdop fld_alt \
1179828 1825.557 E 1 10 0.9 21.6
0 1825.557 E 1 10 0.9 21.6
1 1825.557 E 1 10 0.9 21.6
fld_unit_alt fld_alt_geoid fld_unit_alt_geoid fld_dgps_age fld_dgps_id
1179828 M 31.9 M 0
0 M 31.9 M 0
1 M 31.9 M 0
This works as expected:
# First we grab the open time lat and lons
# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_open", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)
#merge_asof used to get closest match on datetime
pd_open = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_open'], right_on=['date_time_stamp'], direction="nearest")
pd_open["fld_lat_open"] = pd_open["fld_lat"] + ' ' + pd_open["fld_lat_dir"]
pd_open["fld_lon_open"] = pd_open["fld_lon"] + ' ' + pd_open["fld_lon_dir"]
This fails with:
'ValueError: left keys must be sorted'
# Now we grab the close time lat and lons
# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_close", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)
#merge_asof used to get closest match on datetime
pd_close = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_close'], right_on=['date_time_stamp'], direction="nearest")
pd_close["fld_lat_close"] = pd_close["fld_lat"] + ' ' + pd_close["fld_lat_dir"]
pd_close["fld_lat_close"] = pd_close["fld_lon"] + ' ' + pd_close["fld_lon_dir"]
Any advice would be greatly appreciated.
As noted by @JohnE, there were NaT values present in the df_stationManager dataframe.
Resolved by cleaning before merging:
df_stationManager = df_stationManager.dropna()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With