I'm trying to identify the best way to make a simple pivot on my data:
import pandas
dfn = pandas.DataFrame({
"A" : [ 'aaa', 'bbb', 'aaa', 'bbb' ],
"B" : [ 1, 10, 2, 30 ],
"C" : [ 2, 0, 3, 20 ] })
The output I would like to have is a dataframe, grouped by A
, that sum and count values of B
and C
, and names have to be exactly (Sum_B
, Sum_C
, Count
), as following:
A Sum_B Sum_C Count
aaa 3 5 2
bbb 50 20 2
What is the fastest way to do this?
Often you may want to calculate the sum and the count of the same field in a pivot table in Excel. You can easily do this by dragging the same field into the Values box twice when creating a pivot table.
You can use the aggfunc= (aggregation function) parameter to change how data are aggregated in a pivot table. By default, Pandas will use the . mean() method to aggregate data. You can pass a named function, such as 'mean' , 'sum' , or 'max' , or a function callable such as np.
Counting distinct values in Pandas pivot If we want to count the unique occurrences of a specific observation (row) we'll need to use a somewhat different aggregation method. aggfunc= pd. Series. nunique will allow us to count only the distinct rows in the DataFrame that we pivoted.
Pandas DataFrame sum() MethodThe sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
you can use .agg() function:
In [227]: dfn.groupby('A').agg({'B':sum, 'C':sum, 'A':'count'}).rename(columns={'A':'count'})
Out[227]:
B count C
A
aaa 3 2 5
bbb 40 2 20
or with reset_index()
:
In [239]: dfn.groupby('A').agg({'B':sum, 'C':sum, 'A':'count'}).rename(columns={'A':'count'}).reset_index()
Out[239]:
A B count C
0 aaa 3 2 5
1 bbb 40 2 20
PS Here is a link to examples provided by @evan54
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With