Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas pivot_table showing a derived metric (from two columns)

Tags:

python

pandas

I am new to Python. I will try my best to provide enough details.

My data frame:

df = pd.DataFrame({'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                      'company': ['A', 'A', 'A', 'A', 'A','B','B','B','B','B'],
                      'bin1': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1],
                      'bin2': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1],
                      'offered': [10, 15, 25, 30, 20, 5, 40, 50, 55, 0],
                      'accepted': [5, 10, 20, 25, 15, 5, 20, 5, 30, 0]})

    id  company bin1    bin2   offered  accepted
0   1   A       1       1      10       5
1   2   A       2       2      15       10
2   3   A       3       3      25       20
3   4   A       1       1      30       25
4   5   A       2       2      20       15
5   6   B       3       3      5        5
6   7   B       1       1      40       20
7   8   B       2       2      50       5
8   9   B       3       3      55       30
9   10  B       1       1      0        0


I want to create a pivot table showing:

- index=['company','bin1']
- columns=['bin2']
- three metrices: sum(offered), sum(accepted), accept_rate (formula = accepted divided by offered)

All I know what to do is this:

df.pivot_table(values=['accepted','offered'], index=['company','bin1'], columns=['bin2'], aggfunc=[np.sum])

               sum
               accepted         offered
bin2           1    2    3      1    2    3
company bin1                        
A      1       30.0 NaN  NaN    40.0 NaN  NaN
       2       NaN  25.0 NaN    NaN  35.0 NaN
       3       NaN  NaN  20.0   NaN  NaN  25.0
B      1       20.0 NaN  NaN    40.0 NaN  NaN
       2       NaN  5.0  NaN    NaN  50.0 NaN
       3       NaN  NaN  35.0   NaN  NaN  60.0

How can I add the third metrics (i.e., accept_rate)? Ideally, I want to show all three metrics side-by-side.

               sum
               accepted         offered          accept_rate
bin2           1    2    3      1    2    3      1      2       3
company bin1                        
A      1       30.0 NaN  NaN    40.0 NaN  NaN    0.75   NaN     NaN
       2       NaN  25.0 NaN    NaN  35.0 NaN    NaN    0.714   NaN
       3       NaN  NaN  20.0   NaN  NaN  25.0   NaN    NaN     0.8
B      1       20.0 NaN  NaN    40.0 NaN  NaN    0.5    NaN     NaN
       2       NaN  5.0  NaN    NaN  50.0 NaN    NaN    0.1     NaN
       3       NaN  NaN  35.0   NaN  NaN  60.0   NaN    NaN     0.58

Please note: offered and accepted are set to 0 on the last row/observation. The real data will also have 0. So, adding a new column (accepted/offered) to the df, then using aggfunc=np.mean won't work.

Thank you in advance!

like image 640
adriant42 Avatar asked Mar 01 '26 03:03

adriant42


1 Answers

Fun to notice that you can get it in second step

>>> df1['sum']['accepted'] / df1['sum']['offered']
bin2             1         2         3
company bin1
A       1     0.75       NaN       NaN
        2      NaN  0.714286       NaN
        3      NaN       NaN  0.800000
B       1     0.50       NaN       NaN
        2      NaN  0.100000       NaN
        3      NaN       NaN  0.583333
like image 83
crayxt Avatar answered Mar 03 '26 15:03

crayxt