I have a dataframe that looks like this
Month Fruit Sales
1 Apple 45
1 Bananas 12
3 Apple 6
1 Kiwi 34
12 Melon 12
I'm trying to get a dataframe that goes like this
Fruit Sales (month=1) Sales (month=2)
Apple 55 65
Bananas 12 102
Kiwi 54 78
Melon 132 43
Right now I have
df=df.groupby(['Fruit']).agg({'Sales':np.sum}).reset_index()
There has to be some way to filter the arguments within agg() based on the "Month" variable. I just haven't been able to find it in the docs. Any help?
Edit: Thanks for the solutions. To complicate things, I would like to sum up another column as well. Example:
Month Fruit Sales Revenue
1 Apple 45 45
1 Bananas 12 12
3 Apple 6 6
1 Kiwi 34 34
12 Melon 12 12
The preferred output would be similar to
Sales Revenue
Fruit 1 3 12 1 3 12
0 Apple 61 6 0 61 6 0
1 Bananas 12 6 0 12 6 0
2 Kiwi 34 0 0 34 0 0
3 Melon 0 0 12 0 0 12
I managed to get this with df.pivot_table(values=['Sales','Revenue'], index='Fruit', columns=['Month'], aggfunc='np.sum').reset_index()
, so my problem is resolved.
I attempted the same thing with df.groupby(['Fruit', 'Month'])['Sales','Revenue'].sum().unstack('Month', fill_value=0).rename_axis(None, 1).reset_index()
, but this throws a TypeError. Can the above operation be done with groupby
as well?
To calculate mean values grouped on another column in pandas, we will use groupby, and then we will apply mean() method. Pandas allow us a direct method called mean() which calculates the average of the set passed into it.
If you want to get a single value for each group, use aggregate() (or one of its shortcuts). If you want to get a subset of the original rows, use filter() . And if you want to get a new value for each original row, use transpose() .
GROUP BY enables you to use aggregate functions on groups of data returned from a query. FILTER is a modifier used on an aggregate function to limit the values used in an aggregation. All the columns in the select statement that aren't aggregated should be specified in a GROUP BY clause in the query.
To answer the updated question you should do things a little differently. First group by the elements that should be the columns afterwards (Month and Fruit). Then calculate the sum of those groups and unstack the DataFrame afterwards which leaves the Fruit column as the index column.
data = '''
Month Fruit Sales Revenue
1 Apple 45 45
1 Bananas 12 12
1 Apple 16 16
3 Apple 6 6
1 Kiwi 34 34
3 Bananas 6 6
12 Melon 12 12
'''
df = pd.read_csv(StringIO(data), sep='\s+')
df.groupby(['Month', 'Fruit'])\
.sum()\
.unstack(level=0)
Result
Sales Revenue
Month 1 3 12 1 3 12
Fruit
Apple 61.0 6.0 NaN 61.0 6.0 NaN
Bananas 12.0 6.0 NaN 12.0 6.0 NaN
Kiwi 34.0 NaN NaN 34.0 NaN NaN
Melon NaN NaN 12.0 NaN NaN 12.0
Use the pivot_table
method:
import pandas as pd
from io import StringIO
data = '''\
Month Fruit Sales
1 Apple 45
1 Bananas 12
1 Apple 16
3 Apple 6
1 Kiwi 34
3 Bananas 6
12 Melon 12
'''
df = pd.read_csv(StringIO(data), sep='\s+')
df.pivot_table('Sales', index='Fruit', columns=['Month'], aggfunc='sum')
Result:
Month 1 3 12
Fruit
Apple 61.0 6.0 NaN
Bananas 12.0 6.0 NaN
Kiwi 34.0 NaN NaN
Melon NaN NaN 12.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With