I'm trying to join two pandas.DataFrames
on a datetime64[ns, UTC]
field and it's failing with a ValueError
(described below) that is not intuitive to me. Consider the example:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> s_1 = pd.Series(np.random.randn(2,), index=['1981-12-10', '1984-09-14'])
>>> s_1.index = pd.to_datetime(s_1.index, utc=True)
>>> df_1 = pd.DataFrame(s_1, columns=['s_1']).assign(date=s_1.index)
>>> df_1.dtypes
s_1 float64
date datetime64[ns, UTC]
dtype: object
>>>
>>> d = {
... 'v': np.random.randn(2,),
... 'close': ['1981-12-10', '1984-09-14']
>>> }
>>> df_2 = pd.DataFrame(data=d)
>>> df_2.close = pd.to_datetime(df_2.close, utc=True)
>>> df_2['date'] = df_2.close.apply(lambda x: x.replace(hour=0, minute=0, second=0))
>>> df_2.dtypes
v float64
close datetime64[ns, UTC]
date datetime64[ns, UTC]
dtype: object
>>>
>>> df_1.join(df_2, on='date', lsuffix='_')
[...stacktrace ommitted for brevity...]
ValueError: You are trying to merge on datetime64[ns, UTC] and int64 columns. If you wish to proceed you should use pd.concat
Clearly the date
field is not an int64
. The documentation for join says "Index should be similar to one of the columns in this one." so I set the index for df_2
to the date
field and tried again:
>>> df_2.set_index('date', drop=False, inplace=True)
>>> df_1.dtypes
s_1 float64
date datetime64[ns, UTC]
dtype: object
>>> df_1.index
DatetimeIndex(['1981-12-10', '1984-09-14'], dtype='datetime64[ns, UTC]', freq=None)
>>>
>>> df_2.dtypes
v float64
close datetime64[ns, UTC]
date datetime64[ns, UTC]
dtype: object
>>> df_2.index
DatetimeIndex(['1981-12-10', '1984-09-14'], dtype='datetime64[ns, UTC]', name='date', freq=None)
>>>
>>> df_1.join(df_2, on='date', lsuffix='_')
[...stacktrace ommitted for brevity...]
ValueError: You are trying to merge on datetime64[ns, UTC] and datetime64[ns] columns. If you wish to proceed you should use pd.concat
Before you suggest I follow the friendly instructions and use pd.concat
, I cannot: this is not my code ;)
Sometimes index-joining with date time indices does not work. I do not really know why but what worked for me is using merge and before explicitly converting the two merge columns as follows:
df['Time'] = pd.to_datetime(df['Time'], utc = True)
After I did this for both columns that worked for me. You could also try this before using the join operation and to convert the two indices again with the above used procedure.
More correct approach can be found here: Pandas timezone-aware timestamp to naive timestamp conversion
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With