I'd like to get some % rates based on a .groupby()
in pandas
. My goal is to take an indicator column Ind
and get the Rate of A (numerator) divided by the total (A+B) in that year
Example Data:
import pandas as pd
import numpy as np
df: pd.DataFrame = pd.DataFrame([['2011','A',1,2,3], ['2011','B',4,5,6],['2012','A',15,20,4],['2012','B',17,12,12]], columns=["Year","Ind","X", "Y", "Z"])
print(df)
Year Ind X Y Z
0 2011 A 1 2 3
1 2011 B 4 5 6
2 2012 A 15 20 4
3 2012 B 17 12 12
Example for year 2011: XRate
would be summing up the A indicators for X (which would be 1) and dividing byt the total (A+B) which would be 5 thus I would receive an Xrate of 0.20.
I would like to do this for all columns X, Y, Z to get the rates. I've tried doing lambda applys but can't quite get the desired results.
Desired Results:
Year XRate YRate ZRate
0 2011 0.20 0.29 0.33
1 2012 0.47 0.63 0.25
You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
To calculate a percentage in Python, use the division operator (/) to get the quotient from two numbers and then multiply this quotient by 100 using the multiplication operator (*) to get the percentage. This is a simple equation in mathematics to get the percentage.
Pandas DataFrame sum() Method The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
You can group
the dataframe on Year
and aggregate using sum:
s1 = df.groupby('Year').sum()
s2 = df.query("Ind == 'A'").groupby('Year').sum()
s2.div(s1).round(2).add_suffix('Rate')
XRate YRate ZRate
Year
2011 0.20 0.29 0.33
2012 0.47 0.62 0.25
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With