In Python, I have a pandas DataFrame similar to the following: <pre class="prettyprint"><code>Item | shop1 | shop2 | shop3 | Category ------------------------------------ Shoes| 45 | 50 | 53 | Clothes TV | 200 | 300 | 250 | Technology Book | 20 | 17 | 21 | Books phone| 300 | 350 | 400 | Technology </code></pre> Where shop1, shop2 and shop3 are the costs of every item in different shops. Now, I need to return a DataFrame, after some data cleaning, like this one: <pre class="prettyprint"><code>Category (index)| size| sum| mean | std ---------------------------------------- </code></pre> where size is the number of items in each Category and sum, mean and std are related to the same functions applied to the 3 shops. How can I do these operations with the split-apply-combine pattern (groupby, aggregate, apply,...) ? Can someone help me out? I'm going crazy with this one...thank you!

Edited for Pandas 0.22+ considering the deprecation of the use of dictionaries in a group by aggregation. We set up a very similar dictionary where we use the keys of the dictionary to specify our functions and the dictionary itself to rename the columns. <pre class="prettyprint"><code>rnm_cols = dict(size='Size', sum='Sum', mean='Mean', std='Std') df.set_index(['Category', 'Item']).stack().groupby('Category') \ .agg(rnm_cols.keys()).rename(columns=rnm_cols) Size Sum Mean Std Category Books 3 58 19.333333 2.081666 Clothes 3 148 49.333333 4.041452 Technology 6 1800 300.000000 70.710678 </code></pre> <hr> option 1 use <code>agg</code> ← link to docs <pre class="prettyprint"><code>agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std') df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs) Std Sum Mean Size Category Books 2.081666 58 19.333333 3 Clothes 4.041452 148 49.333333 3 Technology 70.710678 1800 300.000000 6 </code></pre> option 2 more for less use <code>describe</code> ← link to docs <pre class="prettyprint"><code>df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack() count mean std min 25% 50% 75% max Category Books 3.0 19.333333 2.081666 17.0 18.5 20.0 20.5 21.0 Clothes 3.0 49.333333 4.041452 45.0 47.5 50.0 51.5 53.0 Technology 6.0 300.000000 70.710678 200.0 262.5 300.0 337.5 400.0 </code></pre>

<pre class="prettyprint"><code>df.groupby('Category').agg({'Item':'size','shop1':['sum','mean','std'],'shop2':['sum','mean','std'],'shop3':['sum','mean','std']}) </code></pre> Or if you want it across all shops then: <pre class="prettyprint"><code>df1 = df.set_index(['Item','Category']).stack().reset_index().rename(columns={'level_2':'Shops',0:'costs'}) df1.groupby('Category').agg({'Item':'size','costs':['sum','mean','std']}) </code></pre>

Python pandas groupby aggregate on multiple columns, then pivot

In Python, I have a pandas DataFrame similar to the following:

Item | shop1 | shop2 | shop3 | Category ------------------------------------ Shoes| 45    | 50    | 53    | Clothes TV   | 200   | 300   | 250   | Technology Book | 20    | 17    | 21    | Books phone| 300   | 350   | 400   | Technology

Where shop1, shop2 and shop3 are the costs of every item in different shops. Now, I need to return a DataFrame, after some data cleaning, like this one:

Category (index)| size| sum| mean | std ----------------------------------------

where size is the number of items in each Category and sum, mean and std are related to the same functions applied to the 3 shops. How can I do these operations with the split-apply-combine pattern (groupby, aggregate, apply,...) ?

Can someone help me out? I'm going crazy with this one...thank you!

What is the difference between Groupby and pivot_table in pandas?

What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.

Can you Groupby multiple columns in pandas?

How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.

Is group by or pivot same?

Both pivot_table and groupby are used to aggregate your dataframe. The difference is only with regard to the shape of the result.

What does As_index do in Groupby?

When as_index=True the key(s) you use in groupby() will become an index in the new dataframe. The benefits you get when you set the column as index are: Speed. When you filter values based on the index column eg.

Edited for Pandas 0.22+ considering the deprecation of the use of dictionaries in a group by aggregation.

We set up a very similar dictionary where we use the keys of the dictionary to specify our functions and the dictionary itself to rename the columns.

rnm_cols = dict(size='Size', sum='Sum', mean='Mean', std='Std') df.set_index(['Category', 'Item']).stack().groupby('Category') \   .agg(rnm_cols.keys()).rename(columns=rnm_cols)              Size   Sum        Mean        Std Category                                      Books          3    58   19.333333   2.081666 Clothes        3   148   49.333333   4.041452 Technology     6  1800  300.000000  70.710678

option 1
use agg ← link to docs

agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std') df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs)                    Std   Sum        Mean  Size Category                                      Books        2.081666    58   19.333333     3 Clothes      4.041452   148   49.333333     3 Technology  70.710678  1800  300.000000     6

option 2
more for less
use describe ← link to docs

df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack()              count        mean        std    min    25%    50%    75%    max Category                                                                    Books         3.0   19.333333   2.081666   17.0   18.5   20.0   20.5   21.0 Clothes       3.0   49.333333   4.041452   45.0   47.5   50.0   51.5   53.0 Technology    6.0  300.000000  70.710678  200.0  262.5  300.0  337.5  400.0

df.groupby('Category').agg({'Item':'size','shop1':['sum','mean','std'],'shop2':['sum','mean','std'],'shop3':['sum','mean','std']})

Or if you want it across all shops then:

df1 = df.set_index(['Item','Category']).stack().reset_index().rename(columns={'level_2':'Shops',0:'costs'}) df1.groupby('Category').agg({'Item':'size','costs':['sum','mean','std']})

Python pandas groupby aggregate on multiple columns, then pivot

Tags:

python

pandas

dataframe

pivot

data-cleaning

Davide Tamburrino

People also ask

2 Answers

piRSquared

Scott Boston

Recent Activity

Donate For Us

Python pandas groupby aggregate on multiple columns, then pivot

Tags:

python

pandas

dataframe

pivot

data-cleaning

Davide Tamburrino

People also ask

2 Answers

piRSquared

Scott Boston

Related questions

Recent Activity

Donate For Us