I have a dataframe df with 2 columns as below -
START_DATE MONTHS
0 2015-03-21 240
1 2015-03-21 240
2 2015-03-21 240
3 2015-03-21 240
4 2015-03-21 240
5 2015-01-01 120
6 2017-01-01 240
7 NaN NaN
8 NaN NaN
9 NaN NaN
The datatypes of the 2 columns are objects.
>>> df.dtypes
START_DATE object
MONTHS object
dtype: object
Now, I want to create a new column "Result" by adding df['START_DATE'] & df['MONTHS']. So, I have done the below -
from dateutil.relativedelta import relativedelta
df['START_DATE'] = pd.to_datetime(df['START_DATE'])
df['MONTHS'] = df['MONTHS'].astype(float)
df['offset'] = df['MONTHS'].apply(lambda x: relativedelta(months=x))
df['Result'] = df['START_DATE'] + df['offset']
Here, I get the below error -
TypeError: incompatible type [object] for a datetime/timedelta operation
Note: Wanted to convert df['Months'] to int but wouldn't work as the field had Nulls.
Can you please give me some directions.Thanks.
This is a vectorized way to do this, so should be quite performant. Note that it doesn't handle month crossings / endings (and doesn't deal well with DST changes. I believe that's why you get the times).
In [32]: df['START_DATE'] + df['MONTHS'].values.astype("timedelta64[M]")
Out[32]:
0 2035-03-20 20:24:00
1 2035-03-20 20:24:00
2 2035-03-20 20:24:00
3 2035-03-20 20:24:00
4 2035-03-20 20:24:00
5 2024-12-31 10:12:00
6 2036-12-31 20:24:00
7 NaT
8 NaT
9 NaT
Name: START_DATE, dtype: datetime64[ns]
If you need exact MonthEnd/Begin handling, this is an appropriate method. (Use MonthsOffset to get the same day)
In [33]: df.dropna().apply(lambda x: x['START_DATE'] + pd.offsets.MonthEnd(x['MONTHS']), axis=1)
Out[33]:
0 2035-02-28
1 2035-02-28
2 2035-02-28
3 2035-02-28
4 2035-02-28
5 2024-12-31
6 2036-12-31
dtype: datetime64[ns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With