Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Pivot Table List of Aggfunc

Pandas Pivot Table Dictionary of Agg function

I am trying to calculate 3 aggregative functions during pivoting:

  1. Count
  2. Mean
  3. StDev

This is the code:

n_page = (pd.pivot_table(Main_DF, 
                         values='SPC_RAW_VALUE',  
                         index=['ALIAS', 'SPC_PRODUCT', 'LABLE', 'RAW_PARAMETER_NAME'], 
                         columns=['LOT_VIRTUAL_LINE'],
                         aggfunc={'N': 'count', 'Mean': np.mean, 'Sigma': np.std})
          .reset_index()
         )

Error I am getting is: KeyError: 'Mean'

How can I calculate those 3 functions?

like image 512
Felix Avatar asked Dec 10 '15 04:12

Felix


People also ask

What is Aggfunc in pivot table pandas?

aggfunc : It is an aggregation function and we can set this param with a list of functions, dict, default is numpy. mean. If it is set to a list of functions, the resulting pivot table forms a hierarchical column and this list of functions will be a top-level column.

How do you resolve duplicates in a pivot table?

Right-click on your pivot table and choose Refresh to make the duplicate values appear. Should you encounter this situation in the future, an easy fix is shown in Figure 5: In any version of Excel: Select column A, choose Data, Text to Columns, and then Finish.

Why is my pivot showing duplicates?

Repeated labels are shown only when the PivotTable is in tabular form. They are not shown when compact form or outline form are applied. If you need to, you can change the format in Report layout.


3 Answers

As written in approved answer by @Happy001, aggfunc cant take dict is false. we can actually pass the dict to aggfunc.

A really handy feature is the ability to pass a dictionary to the aggfunc so you can perform different functions on each of the values you select. for example:

import pandas as pd
import numpy as np

df = pd.read_excel('sales-funnel.xlsx')  #loading xlsx file

table = pd.pivot_table(df, index=['Manager', 'Status'], columns=['Product'], values=['Quantity','Price'],
           aggfunc={'Quantity':len,'Price':[np.sum, np.mean]},fill_value=0)
table

In the above code, I am passing dictionary to the aggfunc and performing len operation on Quantity and mean, sum operations on Price.

Here is the output attaching:

enter image description here

The example is taken from pivot table explained.

like image 64
Ganesh_ Avatar answered Oct 17 '22 17:10

Ganesh_


The aggfunc argument of pivot_table takes a function or list of functions but not dict

aggfunc : function, default numpy.mean, or list of functions If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)

So try

n_page = (pd.pivot_table(Main_DF, 
                         values='SPC_RAW_VALUE',  
                         index=['ALIAS', 'SPC_PRODUCT', 'LABLE', 'RAW_PARAMETER_NAME'], 
                         columns=['LOT_VIRTUAL_LINE'],
                         aggfunc=[len, np.mean, np.std])
          .reset_index()
         )

You may want to rename the hierarchical columns afterwards.

like image 38
Happy001 Avatar answered Oct 17 '22 17:10

Happy001


Try using groupby

df = (Main_DF
      .groupby(['ALIAS', 'SPC_PRODUCT', 'LABLE', 'RAW_PARAMETER_NAME'], as_index=False)
      .LOT_VIRTUAL_LINE
      .agg({'N': 'count', 'Mean': np.mean, 'Sigma': np.std})
     )

Setting as_index=False just leaves these as columns in your dataframe so you don't have to reset the index afterwards.

like image 4
Alexander Avatar answered Oct 17 '22 17:10

Alexander