Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MonthEnd object result in <11 * MonthEnds> instead of number

In my pandas dataframe I want to find the difference between dates in months. The function .dt.to_period('M') results in a MonthEnd object like <11 * MonthEnds> instead of the month number.

I tried to change the column type with pd.to_numeric() and to remove the letters with re.sub("[^0-9]", "", 'blablabla123bla'). Both do not work on a MonthEnd object.

df['duration_dataset'] = df['date_1'].dt.to_period('M') - df['date_2'].dt.to_period('M')

I expected 11, but the output is <11 * MonthEnds>.

Here is a minimum dataframe

d = {'date_1': ['2018-03-31','2018-09-30'], 'date_2': ['2017-12-31','2017-12-31']}
df = pd.DataFrame(data=d)

df['date_1'] = pd.to_datetime(df['date_1'], format='%Y-%m-%d')
df['date_2'] = pd.to_datetime(df['date_2'], format='%Y-%m-%d')

df['duration_dataset'] = df['date_1'].dt.to_period('M') - df['date_2'].dt.to_period('M')

df
like image 981
Inge Avatar asked Jan 31 '19 16:01

Inge


1 Answers

This is new behaviour in Pandas 0.24, where subtracting Period() objects give you a DateOffset subclass.

You can get the numeric value from the DateOffset.n attribute:

from operator import attrgetter

df['duration_dataset'] = (
    df['date_1'].dt.to_period('M') -
    df['date_2'].dt.to_period('M')).apply(attrgetter('n'))

This produces

      date_1     date_2  duration_dataset
0 2018-03-31 2017-12-31                 3
1 2018-09-30 2017-12-31                 9

for your sample dataframe.

Rather than convert your dates to periods, you could instead convert them to a month count since the year 0, then subtract those numbers:

df['duration_dataset'] = (
    df['date_1'].dt.year * 12 + df['date_1'].dt.month - 1 -
    (df['date_2'].dt.year * 12 + df['date_2'].dt.month - 1)
)

which can be simplified to

df['duration_dataset'] = (
    12 * (df['date_1'].dt.year - df['date_2'].dt.year) +
    df['date_1'].dt.month - df['date_2'].dt.month
)
like image 60
Martijn Pieters Avatar answered Oct 02 '22 01:10

Martijn Pieters