Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: aggregate based on filter on another column

I have a dataframe that looks like this

Month   Fruit   Sales
1       Apple   45
1       Bananas 12
3       Apple   6
1       Kiwi    34
12      Melon   12

I'm trying to get a dataframe that goes like this

Fruit         Sales (month=1)     Sales (month=2)
Apple         55                  65
Bananas       12                  102
Kiwi          54                  78
Melon         132                 43

Right now I have

df=df.groupby(['Fruit']).agg({'Sales':np.sum}).reset_index()

There has to be some way to filter the arguments within agg() based on the "Month" variable. I just haven't been able to find it in the docs. Any help?

Edit: Thanks for the solutions. To complicate things, I would like to sum up another column as well. Example:

Month    Fruit    Sales  Revenue
1       Apple    45     45
1       Bananas  12     12
3       Apple    6      6
1       Kiwi     34     34
12      Melon    12     12

The preferred output would be similar to

            Sales      Revenue
     Fruit   1  3  12  1   3  12
0    Apple  61  6   0  61  6  0
1  Bananas  12  6   0  12  6  0
2     Kiwi  34  0   0  34  0  0
3    Melon   0  0  12  0   0  12

I managed to get this with df.pivot_table(values=['Sales','Revenue'], index='Fruit', columns=['Month'], aggfunc='np.sum').reset_index(), so my problem is resolved.

I attempted the same thing with df.groupby(['Fruit', 'Month'])['Sales','Revenue'].sum().unstack('Month', fill_value=0).rename_axis(None, 1).reset_index(), but this throws a TypeError. Can the above operation be done with groupby as well?

like image 410
Tuutsrednas Avatar asked Feb 02 '17 21:02

Tuutsrednas


People also ask

How do you calculate average in a column based on criteria in another column in pandas?

To calculate mean values grouped on another column in pandas, we will use groupby, and then we will apply mean() method. Pandas allow us a direct method called mean() which calculates the average of the set passed into it.

What is the difference between aggregating transforming and filtering data?

If you want to get a single value for each group, use aggregate() (or one of its shortcuts). If you want to get a subset of the original rows, use filter() . And if you want to get a new value for each original row, use transpose() .

How do you filter in groupby?

GROUP BY enables you to use aggregate functions on groups of data returned from a query. FILTER is a modifier used on an aggregate function to limit the values used in an aggregation. All the columns in the select statement that aren't aggregated should be specified in a GROUP BY clause in the query.


1 Answers

To answer the updated question you should do things a little differently. First group by the elements that should be the columns afterwards (Month and Fruit). Then calculate the sum of those groups and unstack the DataFrame afterwards which leaves the Fruit column as the index column.

data = '''
Month    Fruit   Sales  Revenue
1       Apple    45     45
1       Bananas  12     12
1       Apple    16     16
3       Apple    6      6
1       Kiwi     34     34
3       Bananas  6      6
12      Melon    12     12
'''
df = pd.read_csv(StringIO(data), sep='\s+')

df.groupby(['Month', 'Fruit'])\
    .sum()\
    .unstack(level=0)

Result

        Sales            Revenue           
Month      1    3     12      1    3     12
Fruit                                      
Apple    61.0  6.0   NaN    61.0  6.0   NaN
Bananas  12.0  6.0   NaN    12.0  6.0   NaN
Kiwi     34.0  NaN   NaN    34.0  NaN   NaN
Melon     NaN  NaN  12.0     NaN  NaN  12.0

old answer

Use the pivot_table method:

import pandas as pd
from io import StringIO

data = '''\
Month Fruit  Sales
1       Apple   45
1       Bananas 12
1       Apple   16
3       Apple   6
1       Kiwi    34
3       Bananas 6
12      Melon   12
'''
df = pd.read_csv(StringIO(data), sep='\s+')

df.pivot_table('Sales', index='Fruit', columns=['Month'], aggfunc='sum')

Result:

Month      1    3     12
Fruit                   
Apple    61.0  6.0   NaN
Bananas  12.0  6.0   NaN
Kiwi     34.0  NaN   NaN
Melon     NaN  NaN  12.0
like image 186
dotcs Avatar answered Oct 23 '22 10:10

dotcs