Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Percentage count on a DataFrame groupby

Tags:

python

pandas

I have a DataFrame (mydf) along the lines of the following:

Index   Feature ID  Stuff1  Stuff2
1       True    1   23      12
2       True    1   54      12
3       False   0   45      67
4       True    0   38      29
5       False   1   32      24
6       False   1   59      39
7       True    0   37      32
8       False   0   76      65
9       False   1   32      12
10      True    0   23      15
..n     True    1   21      99

I am trying to calculate the True and False percentages of the Feature for each ID (0 or 1), and I am looking for two output for each ID:

Feature ID  Percent
True    1   20%
False   1   30%

Feature ID  Percent
True    0   30%
False   0   20%

I have tried a few attempts, but I start getting counts for all columns and then a percentage for all columns.

Here's my bad attempt:

percentageID0 = mydf[ mydf['ID']==0 ].set_index(['Feature']).count()
percentageID1 = mydf[ mydf['ID']==1 ].set_index(['Feature']).count()
fullcount = (mydf.groupby(['ID']).count()).sum()

print (percentageID0/fullcount) * 100
print (percentageID1/fullcount) * 100

Think I am getting mixed up with the groupby/index format.

like image 283
MikG Avatar asked Aug 20 '15 15:08

MikG


People also ask

How do you calculate cumulative percentage in pandas?

First, create a data frame as 'data_frame' and provide the values you need to calculate the cumulative sum, then pass the 'data_frame' parameter to pd. DataFrame() while specifying the column values, and finally, use the cumsum() and sum() built-in functions to calculate the cumulative percentage.

Can you Groupby index in pandas?

How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.


1 Answers

Could be just this:

In [73]:

print pd.DataFrame({'Percentage': df.groupby(('ID', 'Feature')).size() / len(df)})
            Percentage
ID Feature            
0  False           0.2
   True            0.3
1  False           0.3
   True            0.2
like image 162
CT Zhu Avatar answered Sep 28 '22 09:09

CT Zhu