In my pandas dataframe I want to find the difference between dates in months. The function .dt.to_period('M')
results in a MonthEnd object like <11 * MonthEnds>
instead of the month number.
I tried to change the column type with pd.to_numeric()
and to remove the letters with re.sub("[^0-9]", "", 'blablabla123bla')
. Both do not work on a MonthEnd
object.
df['duration_dataset'] = df['date_1'].dt.to_period('M') - df['date_2'].dt.to_period('M')
I expected 11, but the output is <11 * MonthEnds>
.
Here is a minimum dataframe
d = {'date_1': ['2018-03-31','2018-09-30'], 'date_2': ['2017-12-31','2017-12-31']}
df = pd.DataFrame(data=d)
df['date_1'] = pd.to_datetime(df['date_1'], format='%Y-%m-%d')
df['date_2'] = pd.to_datetime(df['date_2'], format='%Y-%m-%d')
df['duration_dataset'] = df['date_1'].dt.to_period('M') - df['date_2'].dt.to_period('M')
df
This is new behaviour in Pandas 0.24, where subtracting Period()
objects give you a DateOffset
subclass.
You can get the numeric value from the DateOffset.n
attribute:
from operator import attrgetter
df['duration_dataset'] = (
df['date_1'].dt.to_period('M') -
df['date_2'].dt.to_period('M')).apply(attrgetter('n'))
This produces
date_1 date_2 duration_dataset
0 2018-03-31 2017-12-31 3
1 2018-09-30 2017-12-31 9
for your sample dataframe.
Rather than convert your dates to periods, you could instead convert them to a month count since the year 0, then subtract those numbers:
df['duration_dataset'] = (
df['date_1'].dt.year * 12 + df['date_1'].dt.month - 1 -
(df['date_2'].dt.year * 12 + df['date_2'].dt.month - 1)
)
which can be simplified to
df['duration_dataset'] = (
12 * (df['date_1'].dt.year - df['date_2'].dt.year) +
df['date_1'].dt.month - df['date_2'].dt.month
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With