I have a DataFrame (mydf
) along the lines of the following:
Index Feature ID Stuff1 Stuff2
1 True 1 23 12
2 True 1 54 12
3 False 0 45 67
4 True 0 38 29
5 False 1 32 24
6 False 1 59 39
7 True 0 37 32
8 False 0 76 65
9 False 1 32 12
10 True 0 23 15
..n True 1 21 99
I am trying to calculate the True and False percentages of the Feature
for each ID
(0 or 1), and I am looking for two output for each ID:
Feature ID Percent
True 1 20%
False 1 30%
Feature ID Percent
True 0 30%
False 0 20%
I have tried a few attempts, but I start getting counts for all columns and then a percentage for all columns.
Here's my bad attempt:
percentageID0 = mydf[ mydf['ID']==0 ].set_index(['Feature']).count()
percentageID1 = mydf[ mydf['ID']==1 ].set_index(['Feature']).count()
fullcount = (mydf.groupby(['ID']).count()).sum()
print (percentageID0/fullcount) * 100
print (percentageID1/fullcount) * 100
Think I am getting mixed up with the groupby/index format.
First, create a data frame as 'data_frame' and provide the values you need to calculate the cumulative sum, then pass the 'data_frame' parameter to pd. DataFrame() while specifying the column values, and finally, use the cumsum() and sum() built-in functions to calculate the cumulative percentage.
How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.
Could be just this:
In [73]:
print pd.DataFrame({'Percentage': df.groupby(('ID', 'Feature')).size() / len(df)})
Percentage
ID Feature
0 False 0.2
True 0.3
1 False 0.3
True 0.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With