Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python, summarize daily data in dataframe to monthly and quarterly

Tags:

python

pandas

I have already loaded my data into Pandas dataframe.

Example:

Date        Price
2012/12/02  141.25
2012/12/05  132.64
2012/12/06  132.11
2012/12/21  141.64                                                     
2012/12/25  143.19  
2012/12/31  139.66  
2013/01/05  145.11  
2013/01/06  145.99  
2013/01/07  145.97
2013/01/11  145.11  
2013/01/12  145.99  
2013/01/24  145.97
2013/02/23  145.11  
2013/03/24  145.99  
2013/03/28  145.97
2013/04/28  145.97
2013/05/24  145.97
2013/06/23  145.11  
2013/07/24  145.99  
2013/08/28  145.97
2013/09/28  145.97

Just two columns, one is data and one is price.

Now how to group or resample the data starts from 2013 to monthly and quarterly df?

Monthly:

Date        Price
2013/01/01  Monthly total
2013/02/01  Monthly total
2013/03/01  Monthly total
2013/04/01  Monthly total
2013/05/01  Monthly total
2013/06/01  Monthly total
2013/07/01  Monthly total
2013/08/01  Monthly total  
2013/09/01  Monthly total

Quarterly:

Date        Price
2013/01/01  Quarterly total
2013/04/01  Quarterly total
2013/07/01  Quarterly total

Please note that the monthly and quarterly data need to start from first day of month but in the original dataframe the first day of month data is missing, quantity of valid daily data in each month could vary. Also the original dataframe has data from 2012 to 2013, I only need monthly and quarterly data from beginning of 2013.

I tried something like

result1 = df.groupby([lambda x: x.year, lambda x: x.month], axis=1).sum()

but does not work.

Thank you!

like image 925
Windtalker Avatar asked Nov 11 '16 18:11

Windtalker


1 Answers

First convert your Date column into a datetime index:

df.Date = pd.to_datetime(df.Date)
df.set_index('Date', inplace=True)

Then use resample. The list of offset aliases is in the pandas documentation. For begin of month resample, use MS, and QS for the quarters:

df.resample('QS').sum()
Out[46]: 
              Price
Date               
2012-10-01   830.49
2013-01-01  1311.21
2013-04-01   437.05
2013-07-01   437.93

df.resample('MS').sum()
Out[47]: 
             Price
Date              
2012-12-01  830.49
2013-01-01  874.14
2013-02-01  145.11
2013-03-01  291.96
2013-04-01  145.97
2013-05-01  145.97
2013-06-01  145.11
2013-07-01  145.99
2013-08-01  145.97
2013-09-01  145.97
like image 120
Zeugma Avatar answered Nov 15 '22 22:11

Zeugma