Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group pandas DataFrame by varying dates?

I am trying to roll up daily data into fiscal quarter data. For example, I have a table with fiscal quarter end dates:

Company Period Quarter_End
M       2016Q1 05/02/2015
M       2016Q2 08/01/2015
M       2016Q3 10/31/2015
M       2016Q4 01/30/2016
WFM     2015Q2 04/12/2015
WFM     2015Q3 07/05/2015 
WFM     2015Q4 09/27/2015
WFM     2016Q1 01/17/2016

and a table of daily data:

Company Date       Price
M       06/20/2015 1.05
M       06/22/2015 4.05
M       07/10/2015 3.45
M       07/29/2015 1.86
M       08/24/2015 1.58
M       09/02/2015 8.64
M       09/22/2015 2.56
M       10/20/2015 5.42
M       11/02/2015 1.58
M       11/24/2015 4.58
M       12/03/2015 6.48
M       12/05/2015 4.56
M       01/03/2016 7.14
M       01/30/2016 6.34
WFM     06/20/2015 1.05
WFM     06/22/2015 4.05
WFM     07/10/2015 3.45
WFM     07/29/2015 1.86
WFM     08/24/2015 1.58
WFM     09/02/2015 8.64
WFM     09/22/2015 2.56
WFM     10/20/2015 5.42
WFM     11/02/2015 1.58
WFM     11/24/2015 4.58
WFM     12/03/2015 6.48
WFM     12/05/2015 4.56
WFM     01/03/2016 7.14
WFM     01/17/2016 6.34

And I would like to create the table below.

Company Period  Quarter_end Sum(Price)
M       2016Q2  8/1/2015    10.41
M       2016Q3  10/31/2015  18.2
M       2016Q4  1/30/2016   30.68
WFM     2015Q3  7/5/2015    5.1
WFM     2015Q4  9/27/2015   18.09
WFM     2016Q1  1/17/2016   36.1

However, I don't know how to group by varying dates without looping through each record. Any help is greatly appreciated.

Thanks!

like image 490
user3357979 Avatar asked Oct 19 '16 13:10

user3357979


People also ask

How do you date a Groupby in Pandas?

Pandas GroupBy allows us to specify a groupby instruction for an object. This specified instruction will select a column via the key parameter of the grouper function along with the level and/or axis parameters if given, a level of the index of the target object/column.

How do I group specific rows in Pandas?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.


2 Answers

I think you can use merge_ordered:

#first convert columns to datetime
df1.Quarter_End = pd.to_datetime(df1.Quarter_End)
df2.Date = pd.to_datetime(df2.Date)


df = pd.merge_ordered(df1, 
                      df2, 
                      left_on=['Company','Quarter_End'], 
                      right_on=['Company','Date'], 
                      how='outer')
print (df)
   Company  Period Quarter_End       Date  Price
0        M  2016Q1  2015-05-02        NaT    NaN
1        M     NaN         NaT 2015-06-20   1.05
2        M     NaN         NaT 2015-06-22   4.05
3        M     NaN         NaT 2015-07-10   3.45
4        M     NaN         NaT 2015-07-29   1.86
5        M  2016Q2  2015-08-01        NaT    NaN
6        M     NaN         NaT 2015-08-24   1.58
7        M     NaN         NaT 2015-09-02   8.64
8        M     NaN         NaT 2015-09-22   2.56
9        M     NaN         NaT 2015-10-20   5.42
10       M  2016Q3  2015-10-31        NaT    NaN
11       M     NaN         NaT 2015-11-02   1.58
12       M     NaN         NaT 2015-11-24   4.58
13       M     NaN         NaT 2015-12-03   6.48
14       M     NaN         NaT 2015-12-05   4.56
15       M     NaN         NaT 2016-01-03   7.14
16       M  2016Q4  2016-01-30 2016-01-30   6.34
17     WFM  2015Q2  2015-04-12        NaT    NaN
18     WFM     NaN         NaT 2015-06-20   1.05
19     WFM     NaN         NaT 2015-06-22   4.05
20     WFM  2015Q3  2015-07-05        NaT    NaN
21     WFM     NaN         NaT 2015-07-10   3.45
22     WFM     NaN         NaT 2015-07-29   1.86
23     WFM     NaN         NaT 2015-08-24   1.58
24     WFM     NaN         NaT 2015-09-02   8.64
25     WFM     NaN         NaT 2015-09-22   2.56
26     WFM  2015Q4  2015-09-27        NaT    NaN
27     WFM     NaN         NaT 2015-10-20   5.42
28     WFM     NaN         NaT 2015-11-02   1.58
29     WFM     NaN         NaT 2015-11-24   4.58
30     WFM     NaN         NaT 2015-12-03   6.48
31     WFM     NaN         NaT 2015-12-05   4.56
32     WFM     NaN         NaT 2016-01-03   7.14
33     WFM  2016Q1  2016-01-17 2016-01-17   6.34

Then backfill NaN in columns Period and Quarter_End by bfill and aggregate sum. If need remove all NaN values, add Series.dropna and last reset_index:

df.Period = df.Period.bfill()
df.Quarter_End = df.Quarter_End.bfill()

print (df.groupby(['Company','Period','Quarter_End'])['Price'].sum().dropna().reset_index())

  Company  Period Quarter_End  Price
0       M  2016Q2  2015-08-01  10.41
1       M  2016Q3  2015-10-31  18.20
2       M  2016Q4  2016-01-30  30.68
3     WFM  2015Q3  2015-07-05   5.10
4     WFM  2015Q4  2015-09-27  18.09
5     WFM  2016Q1  2016-01-17  36.10
like image 143
jezrael Avatar answered Oct 03 '22 15:10

jezrael


  • set_index
  • pd.concat to align indices
  • groupby with agg

prd_df = period_df.set_index(['Company', 'Quarter_End'])

prc_df = price_df.set_index(['Company', 'Date'], drop=False)

df = pd.concat([prd_df, prc_df], axis=1)

df.groupby([df.index.get_level_values(0), df.Period.bfill()])  \
  .agg(dict(Date='last', Price='sum')).dropna()

enter image description here

like image 45
piRSquared Avatar answered Oct 03 '22 15:10

piRSquared