Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get last day of each month in Pandas DataFrame index (using TimeGrouper)

I have a DataFrame with incomplete dates and I only need the date/row of the last day available of each month. I tried using TimeGrouper and take .last() of each group.

import pandas as pd
idx = [pd.datetime(2016,2,1),pd.datetime(2017,1,20),pd.datetime(2017,2,1),pd.datetime(2017,2,27)]
df = pd.DataFrame([1,2,3,4],index=idx)
df
        0
2016-02-01  1
2017-01-20  2
2017-02-01  3
2017-02-27  4

Expecting:

df_eom
        0
2016-02-01  1
2017-01-20  2
2017-02-27  4

However I got this:

df_eom = df.groupby(pd.TimeGrouper(freq='1M')).last()
df_eom
              0
2016-02-29  1.0
2016-03-31  NaN
2016-04-30  NaN
2016-05-31  NaN
2016-06-30  NaN
2016-07-31  NaN
2016-08-31  NaN
2016-09-30  NaN
2016-10-31  NaN
2016-11-30  NaN
2016-12-31  NaN
2017-01-31  2.0
2017-02-28  4.0

Not only it creates date that weren't in df but also changed the index of first and last row of df. Am I using TimeGrouper wrong?

like image 772
cxwf Avatar asked Jan 16 '18 18:01

cxwf


People also ask

How can I group by month from a date field using Python pandas?

Output: In the above example, the dataframe is groupby by the Date column. As we have provided freq = 'M' which means month, so the data is grouped month-wise till the last date of every month and provided sum of price column.


1 Answers

Here's one way

In [795]: df.iloc[df.reset_index().groupby(df.index.to_period('M'))['index'].idxmax()]
Out[795]:
            0
2016-02-01  1
2017-01-20  2
2017-02-27  4

Or

In [802]: df.loc[df.groupby(df.index.to_period('M')).apply(lambda x: x.index.max())]
Out[802]:
            0
2016-02-01  1
2017-01-20  2
2017-02-27  4
like image 128
Zero Avatar answered Nov 15 '22 04:11

Zero