Pandas: aggregate based on filter on another column

Tags:

I have a dataframe that looks like this

Month   Fruit   Sales
1       Apple   45
1       Bananas 12
3       Apple   6
1       Kiwi    34
12      Melon   12

I'm trying to get a dataframe that goes like this

Fruit         Sales (month=1)     Sales (month=2)
Apple         55                  65
Bananas       12                  102
Kiwi          54                  78
Melon         132                 43

Right now I have

df=df.groupby(['Fruit']).agg({'Sales':np.sum}).reset_index()

There has to be some way to filter the arguments within agg() based on the "Month" variable. I just haven't been able to find it in the docs. Any help?

Edit: Thanks for the solutions. To complicate things, I would like to sum up another column as well. Example:

Month    Fruit    Sales  Revenue
1       Apple    45     45
1       Bananas  12     12
3       Apple    6      6
1       Kiwi     34     34
12      Melon    12     12

The preferred output would be similar to

            Sales      Revenue
     Fruit   1  3  12  1   3  12
0    Apple  61  6   0  61  6  0
1  Bananas  12  6   0  12  6  0
2     Kiwi  34  0   0  34  0  0
3    Melon   0  0  12  0   0  12

I managed to get this with df.pivot_table(values=['Sales','Revenue'], index='Fruit', columns=['Month'], aggfunc='np.sum').reset_index(), so my problem is resolved.

I attempted the same thing with df.groupby(['Fruit', 'Month'])['Sales','Revenue'].sum().unstack('Month', fill_value=0).rename_axis(None, 1).reset_index(), but this throws a TypeError. Can the above operation be done with groupby as well?

410

asked Feb 02 '17 21:02

Tuutsrednas

1 Answers

To answer the updated question you should do things a little differently. First group by the elements that should be the columns afterwards (Month and Fruit). Then calculate the sum of those groups and unstack the DataFrame afterwards which leaves the Fruit column as the index column.

data = '''
Month    Fruit   Sales  Revenue
1       Apple    45     45
1       Bananas  12     12
1       Apple    16     16
3       Apple    6      6
1       Kiwi     34     34
3       Bananas  6      6
12      Melon    12     12
'''
df = pd.read_csv(StringIO(data), sep='\s+')

df.groupby(['Month', 'Fruit'])\
    .sum()\
    .unstack(level=0)

Result

        Sales            Revenue           
Month      1    3     12      1    3     12
Fruit                                      
Apple    61.0  6.0   NaN    61.0  6.0   NaN
Bananas  12.0  6.0   NaN    12.0  6.0   NaN
Kiwi     34.0  NaN   NaN    34.0  NaN   NaN
Melon     NaN  NaN  12.0     NaN  NaN  12.0

old answer

Use the pivot_table method:

import pandas as pd
from io import StringIO

data = '''\
Month Fruit  Sales
1       Apple   45
1       Bananas 12
1       Apple   16
3       Apple   6
1       Kiwi    34
3       Bananas 6
12      Melon   12
'''
df = pd.read_csv(StringIO(data), sep='\s+')

df.pivot_table('Sales', index='Fruit', columns=['Month'], aggfunc='sum')

Result:

Month      1    3     12
Fruit                   
Apple    61.0  6.0   NaN
Bananas  12.0  6.0   NaN
Kiwi     34.0  NaN   NaN
Melon     NaN  NaN  12.0

186

answered Oct 23 '22 10:10

dotcs

Related questions
                            
                                calling SQL functions from Blaze
                            
                                Python: Process file using multiple cores
                            
                                PyDev: Running code to interactive console
                            
                                AssertionError: 22 columns passed, passed data had 21 columns
                            
                                itertools product of python dictionary values
                            
                                Python: How to random shuffle a list where each variable will end up in a new place [duplicate]
                            
                                PyCharm: Storing variables in memory to be able to run code from a "checkpoint"
                            
                                Celery - Schedule periodic task at the end of another task
                            
                                resample irregularly spaced data in pandas
                            
                                TensorFlow seems not to use GPU
                            
                                How to deploy django web application on Microsoft Azure cloud services
                            
                                Index a pandas dataframe into Elasticsearch without elasticsearch-py
                            
                                scapy: get DNSQR / DNSRR field values in symbolic/string form
                            
                                Using cross_val_predict against test data set
                            
                                Installing a jupyter notebook extension
                            
                                Drag and drop with pyqt5 (SIGNAL)
                            
                                Django REST Swagger: How to use security section in Swagger settings?
                            
                                Iterative function generation in Python
                            
                                How to I get Kivy 1.9.1 or 1.9.2 to use SDL2 instead of pygame on OSX 10.12.2?
                            
                                Unique values within Pandas group of groups

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: aggregate based on filter on another column

Tags:

python

pandas

aggregate

Tuutsrednas

People also ask

1 Answers

old answer

dotcs

Recent Activity

Donate For Us