Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pd.merge_asof fails with 'ValueError: left keys must be sorted' on second run

Tags:

python

pandas

Hi I'm trying to merge two datasets on the closest matching date_times.

I have two time stamps for open and closed events.

The merge_asof runs fine on the open date, but returns 'ValueError: left keys must be sorted' on the second date_time.

I sort by the relevant date_time on both occasions.

First dataframe:

   idtbl_station_manager     date_time_stamp fld_station_number  \
0                   1121 2017-09-19 15:41:24            AM00571   
1                   1122 2017-09-19 15:41:24            AM00572   
2                   1123 2017-09-19 15:41:24            AM00573   

  fld_grid_number fld_status  fld_station_number_int  \
0     VOY-024-001     CLOSED                     571   
1     VOY-024-002     CLOSED                     572   
2     VOY-024-003     CLOSED                     573   

                  fld_activities date_time_stamp_open fld_lat_open  \
0  Drift Net,CTD-Overside,Dredge  2017-04-13 07:23:35                
1  Drift Net,CTD-Overside,Dredge  2017-04-13 10:15:07   4649.028 S   
2  Drift Net,CTD-Overside,Dredge  2017-04-13 13:15:42   4648.497 S   

  fld_lon_open date_time_stamp_close fld_lat_close fld_lon_close  
0  03759.143 E   2017-04-13 09:51:18    4647.361 S   03759.142 E  
1  03759.143 E   2017-04-13 12:11:00    4647.344 S   03759.143 E  
2                2017-04-13 15:09:26    4647.344 S   03759.143 E  

Second dataframe:

         idtbl_gpgga     date_time_stamp    fld_utc   fld_lat fld_lat_dir  \
1179828      1179829 2017-04-04 02:00:04  000005.00  3354.138           S   
0                  1 2017-04-04 02:00:05  000006.00  3354.138           S   
1                  2 2017-04-04 02:00:07  000008.00  3354.138           S   

          fld_lon fld_lon_dir fld_gps_quality fld_nos fld_hdop fld_alt  \
1179828  1825.557           E               1      10      0.9    21.6   
0        1825.557           E               1      10      0.9    21.6   
1        1825.557           E               1      10      0.9    21.6   

        fld_unit_alt fld_alt_geoid fld_unit_alt_geoid fld_dgps_age fld_dgps_id  
1179828            M          31.9                  M                        0  
0                  M          31.9                  M                        0  
1                  M          31.9                  M                        0  

This works as expected:

# First we grab the open time lat and lons

# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_open", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)

#merge_asof used to get closest match on datetime
pd_open = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_open'], right_on=['date_time_stamp'], direction="nearest")

pd_open["fld_lat_open"] = pd_open["fld_lat"] + ' ' +  pd_open["fld_lat_dir"]
pd_open["fld_lon_open"] = pd_open["fld_lon"] + ' ' +  pd_open["fld_lon_dir"]     

This fails with:

'ValueError: left keys must be sorted'

# Now we grab the close time lat and lons

# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_close", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)

#merge_asof used to get closest match on datetime
pd_close = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_close'], right_on=['date_time_stamp'], direction="nearest")

pd_close["fld_lat_close"] = pd_close["fld_lat"] + ' ' +  pd_close["fld_lat_dir"]
pd_close["fld_lat_close"] = pd_close["fld_lon"] + ' ' +  pd_close["fld_lon_dir"]  

Any advice would be greatly appreciated.

like image 985
Knapie Avatar asked Sep 21 '17 10:09

Knapie


1 Answers

As noted by @JohnE, there were NaT values present in the df_stationManager dataframe.

Resolved by cleaning before merging:

df_stationManager = df_stationManager.dropna() 
like image 166
Knapie Avatar answered Nov 06 '22 13:11

Knapie