Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining on datetime64[ns, UTC] fails using pandas.join

I'm trying to join two pandas.DataFrames on a datetime64[ns, UTC] field and it's failing with a ValueError (described below) that is not intuitive to me. Consider the example:

>>> import pandas as pd
>>> import numpy as np
>>> 
>>> s_1 = pd.Series(np.random.randn(2,), index=['1981-12-10', '1984-09-14'])
>>> s_1.index = pd.to_datetime(s_1.index, utc=True)
>>> df_1 = pd.DataFrame(s_1, columns=['s_1']).assign(date=s_1.index)
>>> df_1.dtypes
s_1                 float64
date    datetime64[ns, UTC]
dtype: object
>>> 
>>> d = {
...    'v': np.random.randn(2,),
...    'close': ['1981-12-10', '1984-09-14']
>>> }
>>> df_2 = pd.DataFrame(data=d)
>>> df_2.close = pd.to_datetime(df_2.close, utc=True)
>>> df_2['date'] = df_2.close.apply(lambda x: x.replace(hour=0, minute=0, second=0))
>>> df_2.dtypes
v                    float64
close    datetime64[ns, UTC]
date     datetime64[ns, UTC]
dtype: object
>>> 
>>> df_1.join(df_2, on='date', lsuffix='_')
[...stacktrace ommitted for brevity...]
ValueError: You are trying to merge on datetime64[ns, UTC] and int64 columns. If you wish to proceed you should use pd.concat

Clearly the date field is not an int64. The documentation for join says "Index should be similar to one of the columns in this one." so I set the index for df_2 to the date field and tried again:

>>> df_2.set_index('date', drop=False, inplace=True)
>>> df_1.dtypes
s_1                 float64
date    datetime64[ns, UTC]
dtype: object
>>> df_1.index
DatetimeIndex(['1981-12-10', '1984-09-14'], dtype='datetime64[ns, UTC]', freq=None)
>>> 
>>> df_2.dtypes
v                    float64
close    datetime64[ns, UTC]
date     datetime64[ns, UTC]
dtype: object
>>> df_2.index
DatetimeIndex(['1981-12-10', '1984-09-14'], dtype='datetime64[ns, UTC]', name='date', freq=None)
>>> 
>>> df_1.join(df_2, on='date', lsuffix='_')
[...stacktrace ommitted for brevity...]
ValueError: You are trying to merge on datetime64[ns, UTC] and datetime64[ns] columns. If you wish to proceed you should use pd.concat

Before you suggest I follow the friendly instructions and use pd.concat, I cannot: this is not my code ;)

like image 809
Jason Strimpel Avatar asked Jan 31 '19 01:01

Jason Strimpel


1 Answers

Sometimes index-joining with date time indices does not work. I do not really know why but what worked for me is using merge and before explicitly converting the two merge columns as follows:

df['Time'] = pd.to_datetime(df['Time'], utc = True)

After I did this for both columns that worked for me. You could also try this before using the join operation and to convert the two indices again with the above used procedure.

More correct approach can be found here: Pandas timezone-aware timestamp to naive timestamp conversion

like image 142
inneb Avatar answered Oct 02 '22 06:10

inneb