Below is a snippet of my pivot table output in .csv format after using pandas pivot_table function:
Sub-Product 11/1/12 11/2/12 11/3/12 11/4/12 11/5/12 11/6/12
GP Acquisitions 164 168 54 72 203 167
GP Applications 190 207 65 91 227 200
GPF Acquisitions 1124 1142 992 1053 1467 1198
GPF Applications 1391 1430 1269 1357 1855 1510
The only thing I need to do now is to use groupby in pandas to sum up the values by week for each Sub Product before I output it to a .csv file.
Below is the output I want, but it is done in Excel. The first column might not be exactly the same but I am fine with that. The main thing I need to do is to group the days by week such that I can get sum of the data to be by week. (See how the top row has the dates grouped by every 7 days). Hoping to be able to do this using python/pandas. Is it possible?
Row Labels 11/4/12 - 11/10/12 11/11/12 - 11/17/12
GP
Acquisitions 926 728
Applications 1092 889
GPF
Acquisitions 8206 6425
Applications 10527 8894
Group data In the PivotTable, right-click a value and select Group. In the Grouping box, select Starting at and Ending at checkboxes, and edit the values if needed. Under By, select a time period. For numerical fields, enter a number that specifies the interval for each group.
What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.
The “group by” process: split-apply-combine (1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.
The tool you need is resample
, which implicitly uses groupby over a time period/frequency and applies a function like mean or sum.
Read data.
In [2]: df
Out[2]:
Sub-Product 11/1/12 11/2/12 11/3/12 11/4/12 11/5/12 11/6/12
GP Acquisitions 164 168 54 72 203 167
GP Applications 190 207 65 91 227 200
GPF Acquisitions 1124 1142 992 1053 1467 1198
GPF Applications 1391 1430 1269 1357 1855 1510
Set up a MultiIndex.
In [4]: df = df.reset_index().set_index(['index', 'Sub-Product'])
In [5]: df
Out[5]:
11/1/12 11/2/12 11/3/12 11/4/12 11/5/12 11/6/12
index Sub-Product
GP Acquisitions 164 168 54 72 203 167
Applications 190 207 65 91 227 200
GPF Acquisitions 1124 1142 992 1053 1467 1198
Applications 1391 1430 1269 1357 1855 1510
Parse the columns as proper datetimes. (They come in as strings.)
In [6]: df.columns = pd.to_datetime(df.columns)
In [7]: df
Out[7]:
2012-11-01 2012-11-02 2012-11-03 2012-11-04 \
index Sub-Product
GP Acquisitions 164 168 54 72
Applications 190 207 65 91
GPF Acquisitions 1124 1142 992 1053
Applications 1391 1430 1269 1357
2012-11-05 2012-11-06
index Sub-Product
GP Acquisitions 203 167
Applications 227 200
GPF Acquisitions 1467 1198
Applications 1855 1510
Resample the columns (axis=1
) weekly ('w'
), summing by week. (how='sum'
or how=np.sum
are both valid options here.)
In [10]: df.resample('w', how='sum', axis=1)
Out[10]:
2012-11-04 2012-11-11
index Sub-Product
GP Acquisitions 458 370
Applications 553 427
GPF Acquisitions 4311 2665
Applications 5447 3365
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With