I have the following code:
import numpy as np
import pandas as pd
obs = pd.DataFrame({
'storm': [1, 1, 1, 1, 0, 0, 0, 0],
'lightning': [1, 1, 0, 0, 1, 1, 0, 0],
'thunder': [1, 0, 1, 0, 1, 0, 1, 0],
'p': [0.20, 0.05, 0.04, 0.36, 0.04, 0.01, 0.03, 0.27]
})
g1=obs.groupby(['lightning','thunder']).agg({'p':'sum'})
g2=obs.groupby(['lightning','thunder','storm']).agg({'p':'sum'})
which gives
Now how to divide more detailed groupby by less detailed (to calculate percentage)?
I have read this Pandas percentage of total with groupby but was unable to derive how to rewrite for my case.
Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.
The simple division (/) operator is the first way to divide two columns. You will split the First Column with the other columns here. This is the simplest method of dividing two columns in Pandas.
div() method divides element-wise division of one pandas DataFrame by another. DataFrame elements can be divided by a pandas series or by a Python sequence as well. Calling div() on a DataFrame instance is equivalent to invoking the division operator (/).
Grouping by multiple columns with multiple aggregations functions. Can you groupby your data set multiple columns in Pandas? You bet! Here's an example of multiple aggregations per grouping, each with their specific calculated function: a sum of the aggregating column and an average calculation.
g2.unstack()
to get last level into columns. Then divide, broadcasting over columns. Then stack
again.
g2.unstack().div(g1.p, axis=0).stack()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With