how to divide the sum of a groupby value with the count the another value

Tags:

I wanted to groupby by 'label' and 'month' to sum the Quantity sold for each month and for each label.

Dataset

I am trying to do 'groupby and apply' method for achieving this, but not sure how to count the month for each label. Say, for label value AFFLELOU (DOS), I have two values for 7th month. so, I should sum the quantity sold and divide by 2. for 9th and 10th month, I just have one value, so the count would be 1 and it would divide the quantity sold.

I wrote the code below, but it doesn't take count as a function and return count not defined error.

Click to copy

t1.groupby(['label', 'month']).apply(lambda x: x['Quantity sold'] 
.sum()/count('month'))

Can someone tell me how to get the count value of each month for each label?

Thanks in Advance.

422

asked Mar 27 '18 13:03

vishnu prashanth

1 Answers

Instead of summing, counting and dividing, you could use agg('mean'):

Click to copy

t1.groupby(['label', 'month'])['Quantity sold'].agg('mean')

Or, if you do wish to retain the sum and count, use:

Click to copy

t1.groupby(['label', 'month'])['Quantity sold'].agg(['sum', 'count', 'mean'])

For example,

Click to copy

import numpy as np
import pandas as pd

t1 = pd.DataFrame(np.random.randint(4, size=(20,3)), columns=['label', 'Quantity sold', 'month'])
t1.groupby(['label', 'month'])['Quantity sold'].agg(['sum', 'count', 'mean'])

yields a DataFrame like

Click to copy

             sum  count  mean
label month                  
0     1        2      1  2.00
      2        0      1  0.00
      3        2      2  1.00
1     1        1      2  0.50
      2        3      1  3.00
      3        1      1  1.00
2     0        0      1  0.00
      1        0      3  0.00
      3        5      4  1.25
3     0        1      1  1.00
      1        0      1  0.00
      2        0      1  0.00
      3        3      1  3.00

Using groupby/agg with its builtin aggregators sum, count and mean is clearly more convenient here, but if you did need to use groupby/apply with a custom function you could use:

Click to copy

t1.groupby(['label', 'month']).apply(lambda x: x['Quantity sold'].sum()/len(x))

Note that while calling custom functions with groupby/apply gives you more flexibility, it comes at a cost because calling a custom Python function once for each group is generally slower than calling the builtin Cythonized aggregators available in groupby/agg.

If you have missing (NaN) values in Quantity sold, it may help to know that group/agg has both 'count' and 'size' aggregators:

'count' returns the number of non-NaN values
'size' returns the length of the group (including NaN values)

The count is always less than or equal to the size. The mean is the sum (of the non-NaN values) divided by the count. To see the difference between count and size, you could experiment with this code:

Click to copy

np.random.seed(2018)
t1 = pd.DataFrame(np.random.randint(4, size=(50,3)), columns=['label', 'Quantity sold', 'month'])
t1.loc[np.random.choice([True, False], len(t1)), 'Quantity sold'] = np.nan
t1.groupby(['label', 'month'])['Quantity sold'].agg(['sum', 'count', 'size', 'mean'])

which yields

Click to copy

             sum  count  size      mean
label month                            
0     1      0.0      0     3       NaN
      2      6.0      2     2  3.000000
      3      0.0      0     1       NaN
1     0      3.0      2     5  1.500000
      1      0.0      0     1       NaN
      2      5.0      3     5  1.666667
      3      0.0      2     3  0.000000
2     0      7.0      3     5  2.333333
      1      4.0      4     8  1.000000
      2      5.0      2     3  2.500000
      3      5.0      2     3  2.500000
3     0      1.0      2     5  0.500000
      1      3.0      1     1  3.000000
      2      2.0      1     2  2.000000
      3      2.0      1     3  2.000000

answered Oct 10 '22 03:10

unutbu

Related questions
                            
                                Why doesn't python3's print statement flush output when end keyword is specified?
                            
                                Python: Loop to open multiple folders and files in python
                            
                                Pythonic cumulative map
                            
                                Keras - method on_batch_end is slow but only callback I have is checkpoint
                            
                                Jupyter Notebook: (OperationalError('disk I/O error',))
                            
                                cant iterate nested for loop as wanted -python -maybe a simple mistake
                            
                                Creating a "white" image in numpy (2-D image)
                            
                                how to import resource module?
                            
                                Django channels: No module named 'asgiref.sync'
                            
                                Find the nearest location using numpy
                            
                                Asyncio in Django
                            
                                python decorators *args and ** kwargs
                            
                                CountVectorizer converts words to lower case
                            
                                Python Pandas DataFrame str contains merge if
                            
                                Getting meta values from multiple level with json_normalize
                            
                                Insert a value after another value in a list
                            
                                Stop shutil.make_archive adding archive to itself
                            
                                What is a good crawling speed rate?
                            
                                NumPy equivalent of merge
                            
                                Python - Firebase DB reference error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to divide the sum of a groupby value with the count the another value

Tags:

python

pandas

count

pandas-groupby

vishnu prashanth

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us