I have a column of sites: ['Canada', 'USA', 'China' ....]
Each site occurs many times in the SITE column and next to each instance is a true or false value.
INDEX | VALUE | SITE
0 | True | Canada
1 | False | Canada
2 | True | USA
3 | True | USA
And it goes on.
Goal 1: I want to find, for each site, what percent of the VALUE column is True.
Goal 2: I want to return a list of sites where % True in the VALUE column is greater than 10%.
How do I use groupby to achieve this? I only know how to use groupby to find the mean for each site which won't help me here.
You can calculate the percentage of total with the groupby of pandas DataFrame by using DataFrame. groupby() , DataFrame. agg() , DataFrame. transform() methods and DataFrame.
Pandas DataFrame pct_change() Method The pct_change() method returns a DataFrame with the percentage difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
A Percentage is calculated by the mathematical formula of dividing the value by the sum of all the values and then multiplying the sum by 100. This is also applicable in Pandas Dataframes. Here, the pre-defined sum() method of pandas series is used to compute the sum of all the values of a column.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
Something like this:
In [13]: g = df.groupby('SITE')['VALUE'].mean()
In [14]: g[g > 0.1]
Out[14]:
SITE
Canada 0.5
USA 1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With